Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Eugen Leitl
On Mon, Sep 28, 2009 at 06:04:01PM -0400, Thomas Burgess wrote:
> personally i like this case:
> 
> 
> http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021
> 
> it's got 20 hot swap bays, and it's surprisingly well built.  For the money,
> it's an amazing deal.

You don't like http://www.supermicro.com/products/nfo/chassis_storage.cfm ?
I must admit I don't have a price list of these.

When running that many hard drives I would insist on redundant
power supplies, and server motherboards with ECC memory. Unless
it's for home use, where a downtime of days or weeks is not critical.

-- 
Eugen* Leitl http://leitl.org";>leitl http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Erik Trimble

Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Connamacher wrote:

I was thinking of custom building a server, which I think I can do 
for around $10,000 of hardware (using 45 SATA drives and a custom 
enclosure), and putting OpenSolaris on it. It's a bit of a risk 
compared to buying a $30,000 server, but would be a fun experiment.


Others have done similar experiments with considerable success.

Bob
--


Yes, but be careful of your workload on SATA disks.   SATA can be very 
good for sequential read and write, but only under lower loads, even 
with a serious SSD cache.  I'd want to benchmark things with your 
particular workload before using SATA instead of SAS.


To mention things:  Sun's 7110 lists for $11k in the 2TB (with SAS) disk 
configuration.   If you have a longer-term storage needs, look at a 
X4540 Thor (the replacement for the X4500 Thumpers). They're 
significantly more reliable and manageable than a custom-built solution. 
And reasonably cost-competitive. ( >> $1/GB after discount).   Both the 
Thor and 7110 are available for Try-and-Buy.  Get them and test them 
against your workload - it's the only way to be sure (to paraphrase 
Ripley).


Not just for Sun kit, but I'd be very wary of using any 
no-service-contract hardware for something that is business critical, 
which I can't imagine your digital editing system isn't. Don't be 
penny-wise and pound-foolish.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

11:04pm, Paul Archer wrote:


Cool.
FWIW, there appears to be an issue with the LSI 150-6 card I was using. I 
grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, 
and I'm getting write speeds of about 60-70MB/sec, which is about 40x the 
write speed I was seeing with the old card.


Paul


Small correction: I was seeing writes in the 60-70MB range because I was 
writing to a single 2TB (on its own pool). When I tried writing back to 
the primary (4+1 raid-z) pool, I was getting between 100-120MB/sec. 
(That's for sequential writes, anyway.)


paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

Cool.
FWIW, there appears to be an issue with the LSI 150-6 card I was using. I 
grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, 
and I'm getting write speeds of about 60-70MB/sec, which is about 40x the 
write speed I was seeing with the old card.


Paul


Tomorrow, Robert Milkowski wrote:


Paul Archer wrote:
In light of all the trouble I've been having with this zpool, I bought a 
2TB drive, and I'm going to move all my data over to it, then destroy the 
pool and start over.


Before I do that, what is the best way on an x86 system to format/label the 
disks?





if entire disk is going to be dedicated to a one zfs pool then don't bother 
with manual labeling - when creating a pool provide a disk name without a 
slice name (so for example c0d0 instead of c0d0s0) and zfs will automatically 
put an EFI label on it with s0 representing entire disk (- reserved area).


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Richard Elling wrote:


Many people here would profoundly disagree with the above.  There is no 
substitute for good backups, but a periodic scrub helps validate that a 
later resilver would succeed.  A perioic scrub also helps find system 
problems early when they are less likely to crater your business.  It is 
much better to find an issue during a scrub rather than during resilver of 
a mirror or raidz.


As I said, I am concerned that people would mistakenly expect that scrubbing
offers data protection. It doesn't.  I think you proved my point? ;-)


It does not specifically offer data "protection" but if you have only 
duplex redundancy, it substantially helps find and correct a failure 
which would have caused data loss during a resilver.  The value 
substantially diminishes if you have triple redundancy.


I hope it does not offend that I scrub my mirrored pools once a week.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Richard Connamacher wrote:

I was thinking of custom building a server, which I think I can do 
for around $10,000 of hardware (using 45 SATA drives and a custom 
enclosure), and putting OpenSolaris on it. It's a bit of a risk 
compared to buying a $30,000 server, but would be a fun experiment.


Others have done similar experiments with considerable success.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Richard Connamacher
I was thinking of custom building a server, which I think I can do for around 
$10,000 of hardware (using 45 SATA drives and a custom enclosure), and putting 
OpenSolaris on it. It's a bit of a risk compared to buying a $30,000 server, 
but would be a fun experiment.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Robert Milkowski

Paul Archer wrote:
In light of all the trouble I've been having with this zpool, I bought 
a 2TB drive, and I'm going to move all my data over to it, then 
destroy the pool and start over.


Before I do that, what is the best way on an x86 system to 
format/label the disks?





if entire disk is going to be dedicated to a one zfs pool then don't 
bother with manual labeling - when creating a pool provide a disk name 
without a slice name (so for example c0d0 instead of c0d0s0) and zfs 
will automatically put an EFI label on it with s0 representing entire 
disk (- reserved area).


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Robert Milkowski

Robert Milkowski wrote:

Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Elling wrote:


Scrub could be faster, but you can try
tar cf - . > /dev/null

If you think about it, validating checksums requires reading the data.
So you simply need to read the data.


This should work but it does not verify the redundant metadata.  For 
example, the duplicate metadata copy might be corrupt but the problem 
is not detected since it did not happen to be used.




Not only that - it won't also read all the copies of data if zfs has 
redundancy configured at a pool level. Scrubbing the pool will. And 
that's the main reason behind the scrub - to be able to detect and 
repair checksum errors (if any) while a redundant copy is still fine.




Also doing tar means reading from ARC and/or L2ARC if data is cached 
which won't verify if data is actually fine on a disk. Scrub won't use a 
cache and will always go to physical disks.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Robert Milkowski

Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Elling wrote:


Scrub could be faster, but you can try
tar cf - . > /dev/null

If you think about it, validating checksums requires reading the data.
So you simply need to read the data.


This should work but it does not verify the redundant metadata.  For 
example, the duplicate metadata copy might be corrupt but the problem 
is not detected since it did not happen to be used.




Not only that - it won't also read all the copies of data if zfs has 
redundancy configured at a pool level. Scrubbing the pool will. And 
that's the main reason behind the scrub - to be able to detect and 
repair checksum errors (if any) while a redundant copy is still fine.


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Richard Connamacher wrote:


I'm planning on using RAIDZ2 if it can keep up with my bandwidth 
requirements. So maybe ZFS could be an option after all?


ZFS certainly can be an option.  If you are willing to buy Sun 
hardware, they have a "try and buy" program which would allow you to 
set up a system to evaluate if it will work for you.  Otherwise you 
can use a high-grade Brand-X server and decent-grade Brand-X JBOD 
array to test on.


Sun Sun Storage 7000 series has OpenSolaris and ZFS inside but is 
configured and sold as a closed-box NAS.  The X4550 server is fitted 
with 48 disk drives and is verified to be able to deliver 2.0GB/second 
to a network.


By MB do you mean mega*byte*? If so, 550 MB is more than enough for 
uncompressed 1080p. If you mean mega*bit*, then that's not enough. 
But as you said, you're using a mirrored setup, and RAID-Z should be 
faster.


Yes.  I mean megabyte.  This is a 12-drive StorageTek 2540 with two 
4gbit FC links.  I am getting a peak of more than one FC link 
(550MB/second with a huge file).


A JBOD SAS array would be a much better choice now but these products 
had not yet come to market when I ordered my hardware.


This might work for Final Cut editing using QuickTime files. But FX 
and color grading using TIFF frames at 130 MB/s would slow your 
setup to a crawl. Do you think RAID-Z would help here?


There is no reason why RAID-Z is necessarily faster at sequential 
reads than mirrors and in fact mirrors can be faster due to fewer disk 
seeks.  With mirrors, it is theoretically possible to schedule reads 
from all 12 of my disks at once.  It is just a matter of the 
tunings/options that the ZFS implementors decide to provide.


Here are some iozone measurements (taken June 28th) with different 
record sizes running up to a 64GB file size:


  KB  reclen   write rewritereadreread
 8388608  64  482097  595557  1851378  1879145
 8388608 128  429126  621319  1937128  1944177
 8388608 256  428197  646922  1954065  1965570
 8388608 512  489692  585971  1593610  1584573
16777216  64  439880   41304   822968   841246
16777216 128  443119  435886   815705   844789
16777216 256  446006  475347   814529   687915
16777216 512  436627  462599   787369   803182
33554432  64  401110   41096   547065   553262
33554432 128  404420  394838   549944   552664
33554432 256  406367  400859   544950   553516
33554432 512  401254  410153   554100   558650
67108864  64  378158   40794   552623   555655
67108864 128  379809  385453   549364   553948
67108864 256  380286  377397   551060   550414
67108864 512  378225  385588   550131   557150

It seems like every time I run the benchmark, the numbers have 
improved.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread David Magda

On Sep 28, 2009, at 19:39, Richard Elling wrote:

Finally, there are two basic types of scrubs: read-only and  
rewrite.  ZFS does
read-only. Other scrubbers can do rewrite. There is evidence that  
rewrites

are better for attacking superparamagnetic decay issues.


Something that may be possible when *bp rewrite is eventually committed.

Educating post. Thanks.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Richard Connamacher
> For me, agressive prefetch is most important in order to schedule 
> reads from enough disks in advance to produce a high data rate. This 
> is because I am using mirrors. When using raidz or raidz2 the 
> situation should be a bit different because raidz is striped. The 
> prefetch bug which is specifically fixed is when using thousands of 
> files in the 5MB-8MB range which is typical for film postproduction. 
> The bug is that prefetch becomes disabled if the file had been 
> accessed before but its data is no longer in cache.

I'm planning on using RAIDZ2 if it can keep up with my bandwidth requirements. 
So maybe ZFS could be an option after all?

> That is not clear to me yet.  With my setup, I can read up to 
> 550MB/second from a large file.  That is likely the hardware limit for 
> me.  But when reading one-at-a-time from individual 5 or 8MB files, 
> the data rate is much less (around 130MB/second).

By MB do you mean mega*byte*? If so, 550 MB is more than enough for 
uncompressed 1080p. If you mean mega*bit*, then that's not enough. But as you 
said, you're using a mirrored setup, and RAID-Z should be faster.

This might work for Final Cut editing using QuickTime files. But FX and color 
grading using TIFF frames at 130 MB/s would slow your setup to a crawl. Do you 
think RAID-Z would help here?

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Richard Connamacher wrote:

Thanks for the detailed information. When you get the patch, I'd 
love to hear if it fixes the problems you're having. From my 
understanding, a working prefetch would keep video playback from 
stuttering whenever the drive head moves — is this right?


For me, agressive prefetch is most important in order to schedule 
reads from enough disks in advance to produce a high data rate.  This 
is because I am using mirrors.  When using raidz or raidz2 the 
situation should be a bit different because raidz is striped.  The 
prefetch bug which is specifically fixed is when using thousands of 
files in the 5MB-8MB range which is typical for film postproduction. 
The bug is that prefetch becomes disabled if the file had been 
accessed before but its data is no longer in cache.


When doing video playback, it is typical to be reading from several 
files at once in order to avoid the potential for read "stutter".


The inability to read and write simultaneously (within reason) would 
be frustrating for a shared video editing server. I wonder if ZFS 
needs more parallelism? If any software RAID ends up having a


ZFS has a lot of parallelism since it is optimized for large data 
servers.


similar problem, then we might have to go with the hardware RAID 
setups I'm trying to avoid. I wonder if there's any way to work 
around that. Would a bigger write cache help? Or adding an SSD for 
the cache (ZFS Intent Log)? would Linux software RAID be any better?


The problem seems to be that ZFS uses a huge write cache by default 
and it delays flushing it (up to 30 seconds) so that when the write 
cache is flushed, it maximally engages the write channel for up to 5 
seconds.  Decreasing the size of the write cache diminishes the size 
of the problem.


Assuming they fix the prefetch performance issues you talked about, 
do you think ZFS would be able to keep up with uncompressed 1080p HD 
or 2K?


That is not clear to me yet.  With my setup, I can read up to 
550MB/second from a large file.  That is likely the hardware limit for 
me.  But when reading one-at-a-time from individual 5 or 8MB files, 
the data rate is much less (around 130MB/second).


I am using Solaris 10.  OpenSolaris performance seems to be better 
than Solaris 10.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Richard Connamacher
Thanks for the detailed information. When you get the patch, I'd love to hear 
if it fixes the problems you're having. From my understanding, a working 
prefetch would keep video playback from stuttering whenever the drive head 
moves — is this right?

The inability to read and write simultaneously (within reason) would be 
frustrating for a shared video editing server. I wonder if ZFS needs more 
parallelism? If any software RAID ends up having a similar problem, then we 
might have to go with the hardware RAID setups I'm trying to avoid. I wonder if 
there's any way to work around that. Would a bigger write cache help? Or adding 
an SSD for the cache (ZFS Intent Log)? would Linux software RAID be any better?

Assuming they fix the prefetch performance issues you talked about, do you 
think ZFS would be able to keep up with uncompressed 1080p HD or 2K?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Richard Elling

On Sep 28, 2009, at 11:41 AM, Bob Friesenhahn wrote:


On Mon, 28 Sep 2009, Richard Elling wrote:

	   In other words, I am concerned that people replace good data  
protection
	   practices with scrubs and expecting scrub to deliver better  
data protection

   (it won't).


Many people here would profoundly disagree with the above.  There is  
no substitute for good backups, but a periodic scrub helps validate  
that a later resilver would succeed.  A perioic scrub also helps  
find system problems early when they are less likely to crater your  
business.  It is much better to find an issue during a scrub rather  
than during resilver of a mirror or raidz.


As I said, I am concerned that people would mistakenly expect that  
scrubbing

offers data protection. It doesn't.  I think you proved my point? ;-)

Scrubs are also useful for detecting broken hardware. However,  
normal activity will also detect broken hardware, so it is better  
to think of scrubs as finding degradation of old data rather than  
being a hardware checking service.


Do you have a scientific reference for this notion that "old data"  
is more likely to be corrupt than "new data" or is it just a gut- 
feeling? This hypothesis does not sound very supportable to me.   
Magnetic hysteresis lasts quite a lot longer than the recommended  
service life for a hard drive.  Studio audio tapes from the '60s are  
still being used to produce modern "remasters" of old audio  
recordings which sound better than they ever did before (other than  
the master tape).


Those are analog tapes... they just fade away...
For data, it depends on the ECC methods, quality of the media,  
environment, etc.
You will find considerable attention spent on verification of data on  
tapes in
archiving products. In the tape world, there are slightly different  
conditions than
the magnetic disk world, but I can't think of a single study which  
shows that
magnetic disks get more reliable over time, while there are dozens  
which show
that they get less reliable and that latent sector errors dominate, as  
much as 5x,
over full disk failures.  My studies of Sun disk failure rates have  
shown similar

results.

 Some forms of magnetic hysteresis are known to last millions of  
years. Media failure is more often than not mechanical or chemical  
and not related to loss of magnetic hysteresis.  Head failures may  
be construed to be media failures.


Here is a good study from the University of Wisconsin-Madison which  
clearly
shows the relationship between disk age and latent sector errors. It  
also shows
how the increase in aerial density also increases the latent sector  
error (LSE) rate.
Additionally, this gets back to the ECC method, which we observe to be  
different
on consumer-grade and enterprise-class disks. The study shows a clear  
win
for enterprise-class drives wrt latent errors.  The paper suggests a 2- 
week
scrub cycle and recognizes that many RAID arrays have such policies.  
There
are indeed many studies which show latent sector errors are a bigger  
problem

as the disk ages.
An Analysis of Latent Sector Errors in Disk Drives
www.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.ps



See http://en.wikipedia.org/wiki/Ferromagnetic for information on  
ferromagnetic materials.


For disks we worry about the superparamagnetic effect.
http://en.wikipedia.org/wiki/Superparamagnetism

Quoting US Patent 6987630,
... the superparamagnetic effect is a thermal relaxation of information
stored on the disk surface. Because the superparamagnetic effect may
occur at room temperature, over time, information stored on the disk
surface will begin to decay. Once the stored information decays beyond
	a threshold level, it will be unable to be properly read by the read  
head

and the information will be lost.

The superparamagnetic effect manifests itself by a loss in amplitude in
the readback signal over time or an increase in the mean square error
(MSE) of the read back signal over time. In other words, the readback
signal quality metrics are means square error and amplitude as measured
by the read channel integrated circuit. Decreases in the quality of the
readback signal cause bit error rate (BER) increases. As is well known,
the BER is the ultimate measure of drive performance in a disk drive.

This effect is based on the time since written. Hence, older data can  
have

higher MSE and subsequent BER leading to a UER.

To be fair, newer disk technology is constantly improving. But what is
consistent with the physics is that increase in bit densities leads to
more space and rebalancing the BER. IMHO, this is why we see densities
increase, but UER does not increase (hint: marketing always wins these
sorts of battles).

FWIW, flash memories are not affected by superparamagnetic decay.

It would be most useful if zfs incorporated a sl

Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Michael Shadle
well when i start looking into rack configurations i will consider it. :)

here's my configuration - enjoy!
http://michaelshadle.com/2009/09/28/my-recipe-for-zfs-at-home/

On Mon, Sep 28, 2009 at 3:10 PM, Thomas Burgess  wrote:
>  i own this case, it's really not that bad.  It's got 4 fans but they are
> really big and don't make nearly as much noise as you'd think.  honestly,
> it's not bad at all.  I know someone who sits it vertically as well,
> honestly, it's a good case for the money
>
>
> On Mon, Sep 28, 2009 at 6:06 PM, Michael Shadle  wrote:
>>
>> rackmount chassis aren't usually designed with acoustics in mind :)
>>
>> however i might be getting my closet fitted so i can put half a rack
>> in. might switch up my configuration to rack stuff soon.
>>
>> On Mon, Sep 28, 2009 at 3:04 PM, Thomas Burgess 
>> wrote:
>> > personally i like this case:
>> >
>> >
>> > http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021
>> >
>> > it's got 20 hot swap bays, and it's surprisingly well built.  For the
>> > money,
>> > it's an amazing deal.
>> >
>> >
>> >
>> > ___
>> > zfs-discuss mailing list
>> > zfs-discuss@opensolaris.org
>> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>> >
>> >
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Richard Connamacher wrote:

I'm looking at building a high bandwidth file server to store video 
for editing, as an alternative to buying a $30,000 hardware RAID and 
spending $2000 per seat on fibrechannel and specialized SAN drive 
software.


Uncompressed HD runs around 1.2 to 4 gigabits per second, putting it 
in 10 gigabit Ethernet or FibreChannel territory. Any file server 
would have to be able to move that many bits in sustained read and 
sustained write, and doing both simultaneously would be a plus.


Please see a white paper I wrote entitled "ZFS and Digital 
Intermediate" at 
http://www.simplesystems.org/users/bfriesen/zfs-discuss/zfs-and-di.pdf 
which expounds on this topic and makes it sound like zfs is the 
perfect answer for this.


Unfortunately, I have since learned that zfs file prefetch ramps up 
too slowly or becomes disabled for certain workloads.  I reported a 
bug.  Many here are eagerly awaiting the next OpenSolaris development 
release which is supposed to have fixes for the prefetch problem I 
encountered.


I am told that a Solaris 10 IDR (customer-specific patch) will be 
provided to me within the next few days to resolve the performance 
issue.


There is another performance issue in which writes to the server cause 
reads to briefly stop periodically.  This means that the server could 
not be used simultaneously for video playback while files are being 
updated.  To date there is no proposed solution for this problem.


Linux XFS seems like the top contender for video playback and editing. 
From a description of XFS design and behavior, it would not surprise 
me if it stuttered during playback when files are updated as well. 
Linux XFS also buffers written data and writes it out in large batches 
at a time.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive should allow to keep received system

2009-09-28 Thread Albert Chin
On Mon, Sep 28, 2009 at 03:16:17PM -0700, Igor Velkov wrote:
> Not so good as I hope.
> zfs send -R xxx/x...@daily_2009-09-26_23:51:00 |ssh -c blowfish r...@xxx.xx 
> zfs recv -vuFd xxx/xxx
> 
> invalid option 'u'
> usage:
> receive [-vnF] 
> receive [-vnF] -d 
> 
> For the property list, run: zfs set|get
> 
> For the delegated permission list, run: zfs allow|unallow
> r...@xxx:~# uname -a
> SunOS xxx 5.10 Generic_13-03 sun4u sparc SUNW,Sun-Fire-V890
> 
> What's wrong?

Looks like -u was a recent addition.

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive should allow to keep received system

2009-09-28 Thread Lori Alt

On 09/28/09 16:16, Igor Velkov wrote:

Not so good as I hope.
zfs send -R xxx/x...@daily_2009-09-26_23:51:00 |ssh -c blowfish r...@xxx.xx zfs 
recv -vuFd xxx/xxx

invalid option 'u'
usage:
receive [-vnF] 
receive [-vnF] -d 

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow
r...@xxx:~# uname -a
SunOS xxx 5.10 Generic_13-03 sun4u sparc SUNW,Sun-Fire-V890

What's wrong?
  
the option was added in S10 Update 7.  I'm not sure whether the 
patch-level shown above included U7 changes or not.


Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive should allow to keep received system

2009-09-28 Thread Igor Velkov
Not so good as I hope.
zfs send -R xxx/x...@daily_2009-09-26_23:51:00 |ssh -c blowfish r...@xxx.xx zfs 
recv -vuFd xxx/xxx

invalid option 'u'
usage:
receive [-vnF] 
receive [-vnF] -d 

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow
r...@xxx:~# uname -a
SunOS xxx 5.10 Generic_13-03 sun4u sparc SUNW,Sun-Fire-V890

What's wrong?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Thomas Burgess
 i own this case, it's really not that bad.  It's got 4 fans but they are
really big and don't make nearly as much noise as you'd think.  honestly,
it's not bad at all.  I know someone who sits it vertically as well,
honestly, it's a good case for the money


On Mon, Sep 28, 2009 at 6:06 PM, Michael Shadle  wrote:

> rackmount chassis aren't usually designed with acoustics in mind :)
>
> however i might be getting my closet fitted so i can put half a rack
> in. might switch up my configuration to rack stuff soon.
>
> On Mon, Sep 28, 2009 at 3:04 PM, Thomas Burgess 
> wrote:
> > personally i like this case:
> >
> >
> > http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021
> >
> > it's got 20 hot swap bays, and it's surprisingly well built.  For the
> money,
> > it's an amazing deal.
> >
> >
> >
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> >
> >
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive should allow to keep received system

2009-09-28 Thread Igor Velkov
Wah!

Thank you, lalt!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Would ZFS work for a high-bandwidth video SAN?

2009-09-28 Thread Richard Connamacher
I'm looking at building a high bandwidth file server to store video for 
editing, as an alternative to buying a $30,000 hardware RAID and spending $2000 
per seat on fibrechannel and specialized SAN drive software.

Uncompressed HD runs around 1.2 to 4 gigabits per second, putting it in 10 
gigabit Ethernet or FibreChannel territory. Any file server would have to be 
able to move that many bits in sustained read and sustained write, and doing 
both simultaneously would be a plus.

If the drives were plentiful enough and fast enough, could a RAID-Z (on 
currently available off-the-shelf hardware) keep up with that?

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Michael Shadle
rackmount chassis aren't usually designed with acoustics in mind :)

however i might be getting my closet fitted so i can put half a rack
in. might switch up my configuration to rack stuff soon.

On Mon, Sep 28, 2009 at 3:04 PM, Thomas Burgess  wrote:
> personally i like this case:
>
>
> http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021
>
> it's got 20 hot swap bays, and it's surprisingly well built.  For the money,
> it's an amazing deal.
>
>
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Thomas Burgess
personally i like this case:


http://www.newegg.com/Product/Product.aspx?Item=N82E16811219021

it's got 20 hot swap bays, and it's surprisingly well built.  For the money,
it's an amazing deal.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs receive should allow to keep received system unmounted

2009-09-28 Thread Lori Alt

On 09/28/09 15:54, Igor Velkov wrote:
zfs receive should allow option to disable immediately mount of received filesystem. 

In case of original filesystem have changed mountpoints, it's hard to make clone fs with send-receive, because received filesystem immediately try to mount to old mountpoint, that locked by sourcr fs. 
In case of different host mountpoint can be locked by unrelated filesystem.


Can anybody recommend a way to avoid mountpoint conflict in that cases?
  

The -u option to zfs receive suppresses all mounts.

lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Michael Shadle
Yeah - give me a bit to rope together the parts list and double check
it, and I will post it on my blog.


On Mon, Sep 28, 2009 at 2:34 PM, Ware Adams  wrote:
> On Sep 28, 2009, at 4:20 PM, Michael Shadle wrote:
>
>> I agree - SOHO usage of ZFS is still a scary "will this work?" deal. I
>> found a working setup and I cloned it. It gives me 16x SATA + 2x SATA
>> for mirrored boot, 4GB ECC RAM and a quad core processor - total cost
>> without disks was ~ $1k I believe. Not too shabby. Emphasis was also
>> for acoustics - rack dense would be great but my current living
>> situation doesn't warrant that
>
> This sounds interesting.  Do you have any info on it (case you started with,
> etc...).
>
> I'm concerned about noise too as this will be in a closet close to the room
> where our television is.  Currently there is a MacPro in there which isn't
> terribly quiet, but the SuperMicro case is reported to be fairly quiet.
>
> Thanks,
> Ware
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs receive should allow to keep received system unmounted

2009-09-28 Thread Igor Velkov
zfs receive should allow option to disable immediately mount of received 
filesystem. 

In case of original filesystem have changed mountpoints, it's hard to make 
clone fs with send-receive, because received filesystem immediately try to 
mount to old mountpoint, that locked by sourcr fs. 
In case of different host mountpoint can be locked by unrelated filesystem.

Can anybody recommend a way to avoid mountpoint conflict in that cases?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Ware Adams

On Sep 28, 2009, at 4:20 PM, Michael Shadle wrote:


I agree - SOHO usage of ZFS is still a scary "will this work?" deal. I
found a working setup and I cloned it. It gives me 16x SATA + 2x SATA
for mirrored boot, 4GB ECC RAM and a quad core processor - total cost
without disks was ~ $1k I believe. Not too shabby. Emphasis was also
for acoustics - rack dense would be great but my current living
situation doesn't warrant that


This sounds interesting.  Do you have any info on it (case you started  
with, etc...).


I'm concerned about noise too as this will be in a closet close to the  
room where our television is.  Currently there is a MacPro in there  
which isn't terribly quiet, but the SuperMicro case is reported to be  
fairly quiet.


Thanks,
Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer
In light of all the trouble I've been having with this zpool, I bought a 
2TB drive, and I'm going to move all my data over to it, then destroy the 
pool and start over.


Before I do that, what is the best way on an x86 system to format/label 
the disks?


Thanks,

Paul


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Michael Shadle
This seems like you're doing an awful lot of planning for only 8 SATA
+ 4 SAS bays?

I agree - SOHO usage of ZFS is still a scary "will this work?" deal. I
found a working setup and I cloned it. It gives me 16x SATA + 2x SATA
for mirrored boot, 4GB ECC RAM and a quad core processor - total cost
without disks was ~ $1k I believe. Not too shabby. Emphasis was also
for acoustics - rack dense would be great but my current living
situation doesn't warrant that. The noisiest components are the 5-in-3
chassis used in the front of the case. I have to keep the fans on high
(I tried to swap out for larger, quieter fans, but could not get the
fan alarm to shut up) or they go over Seagate's recommended <= 50
degrees.

I really should post my parts list up on my blog. I had to choose
everything to the best of my research online and hope for the best.


On Mon, Sep 28, 2009 at 1:12 PM, Ware Adams  wrote:
> Hello,
>
> I have been researching building a home storage server based on OpenSolaris
> and ZFS, and I would appreciate any time people could take to comment on my
> current leanings.
>
> I've tried to gather old information from this list as well as the HCL, but
> I would welcome anyone's experience on both compatibility and
> appropriateness for my goals.  I'd love if that white box server wiki page
> were set up now, but for now I'll have to just ask here.
>
> My priorities:
>
> 1)  Data security.  I'm hoping I can get this via ECC RAM and enterprise
> drives that hopefully don't lie to ZFS about flushing to disk?  I'll run
> mirrored pools for redundancy (which leads me to want a case w/a lot of
> bays).
> 2)  Compatibility.  For me this translates into low upkeep cost (time).  I'm
> not looking to be the first person to get OpenSolaris running on some
> particular piece of hardware.
> 3)  Scaleable.  I'd like to not have to upgrade every year.  I can always
> use something like an external JBOD array, but there's some appeal to having
> enough space in the case for reasonable growth.  I'd also like to have
> enough performance to keep up with scaling data volume and ZFS features.
> 4)  Ability to run some other (lightweight) services on the box.  I'll be
> using NFS (iTunes libraries for OS X clients) and iSCSI (Time Machine
> backups) primarily, but my current home server also runs a few small
> services (MySQL etc...) that are very lightweight but nevertheless might be
> difficult to do on a ZFS (or "ZFS like") appliance
> 5)  Cost.  All things being equal cheaper is better, but I'm willing to pay
> more to accomplish particularly 1-3 above.
>
> My current thinking:
>
> SuperMicro 7046A-3 Workstation
> http://supermicro.com/products/system/4U/7046/SYS-7046A-3.cfm
> 8 hot swappable drive bays (SAS or SATA, I'd use SATA)
> Network/Main board/SAS/SATA controllers seem well supported by OpenSolaris
> Will take IPMI card for remote admin (with video and iso redirection)
> 12 RAM slots so I can buy less dense chips
> 2x 5.25" drive bays.  I'd use a SuperMicro Mobile Rack M14T
> (http://www.supermicro.com/products/accessories/mobilerack/CSE-M14.cfm) to
> get 4 2.5" SAS drives in one of these.  2 would be used for a mirrored boot
> pool leaving 2 for potential future use (like a ZIL on SSD).
>
> Nehalem E5520 CPU
> These are clearly more than enough now, but I'm hoping to have decent CPU
> performance for say 5 years (and I'm willing to pay for it up front vs.
> upgrading every 2 years...I don't want this to be too time consuming of a
> hobby).  I'd like to have processor capacity for compression and (hopefully
> reasonably soon) de-duplication as well as obviously support ECC RAM.
>
> Crucial RAM in 4 GB density (price scales linearly up through this point and
> I've had good support from Crucial)
>
> Seagate Barracuda ES.2 1TB SATA (Model ST31000340NS) for storage pool.  I
> would like to use a larger drive, but I can't find anything rated to run
> 24x7 larger than 1TB from Seagate.  I'd like to have drives rated for 24x7
> use, and I've had good experience w/Seagate.  Again, a larger case gives me
> some flexibility here.
>
> Misc (mainly interested in compatibility b/c it will hardly be used):
> Sun XVR-100 video card from eBay
> Syba SY-PCI45004
> (http://www.newegg.com/Product/Product.aspx?Item=N82E16816124025) IDE card
> for CD-ROM
> Sony DDU1678A
> (http://www.newegg.com/Product/Product.aspx?Item=N82E16827131061) CD-ROM
>
> Thanks a lot for any thoughts you might have.
>
> --Ware
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Comments on home OpenSolaris/ZFS server

2009-09-28 Thread Ware Adams

Hello,

I have been researching building a home storage server based on  
OpenSolaris and ZFS, and I would appreciate any time people could take  
to comment on my current leanings.


I've tried to gather old information from this list as well as the  
HCL, but I would welcome anyone's experience on both compatibility and  
appropriateness for my goals.  I'd love if that white box server wiki  
page were set up now, but for now I'll have to just ask here.


My priorities:

1)  Data security.  I'm hoping I can get this via ECC RAM and  
enterprise drives that hopefully don't lie to ZFS about flushing to  
disk?  I'll run mirrored pools for redundancy (which leads me to want  
a case w/a lot of bays).
2)  Compatibility.  For me this translates into low upkeep cost  
(time).  I'm not looking to be the first person to get OpenSolaris  
running on some particular piece of hardware.
3)  Scaleable.  I'd like to not have to upgrade every year.  I can  
always use something like an external JBOD array, but there's some  
appeal to having enough space in the case for reasonable growth.  I'd  
also like to have enough performance to keep up with scaling data  
volume and ZFS features.
4)  Ability to run some other (lightweight) services on the box.  I'll  
be using NFS (iTunes libraries for OS X clients) and iSCSI (Time  
Machine backups) primarily, but my current home server also runs a few  
small services (MySQL etc...) that are very lightweight but  
nevertheless might be difficult to do on a ZFS (or "ZFS like") appliance
5)  Cost.  All things being equal cheaper is better, but I'm willing  
to pay more to accomplish particularly 1-3 above.


My current thinking:

SuperMicro 7046A-3 Workstation
http://supermicro.com/products/system/4U/7046/SYS-7046A-3.cfm
8 hot swappable drive bays (SAS or SATA, I'd use SATA)
Network/Main board/SAS/SATA controllers seem well supported by  
OpenSolaris

Will take IPMI card for remote admin (with video and iso redirection)
12 RAM slots so I can buy less dense chips
2x 5.25" drive bays.  I'd use a SuperMicro Mobile Rack M14T (http://www.supermicro.com/products/accessories/mobilerack/CSE-M14.cfm 
) to get 4 2.5" SAS drives in one of these.  2 would be used for a  
mirrored boot pool leaving 2 for potential future use (like a ZIL on  
SSD).


Nehalem E5520 CPU
These are clearly more than enough now, but I'm hoping to have decent  
CPU performance for say 5 years (and I'm willing to pay for it up  
front vs. upgrading every 2 years...I don't want this to be too time  
consuming of a hobby).  I'd like to have processor capacity for  
compression and (hopefully reasonably soon) de-duplication as well as  
obviously support ECC RAM.


Crucial RAM in 4 GB density (price scales linearly up through this  
point and I've had good support from Crucial)


Seagate Barracuda ES.2 1TB SATA (Model ST31000340NS) for storage  
pool.  I would like to use a larger drive, but I can't find anything  
rated to run 24x7 larger than 1TB from Seagate.  I'd like to have  
drives rated for 24x7 use, and I've had good experience w/Seagate.   
Again, a larger case gives me some flexibility here.


Misc (mainly interested in compatibility b/c it will hardly be used):
Sun XVR-100 video card from eBay
Syba SY-PCI45004 (http://www.newegg.com/Product/Product.aspx?Item=N82E16816124025 
) IDE card for CD-ROM
Sony DDU1678A (http://www.newegg.com/Product/Product.aspx?Item=N82E16827131061 
) CD-ROM


Thanks a lot for any thoughts you might have.

--Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)

2009-09-28 Thread Joerg Schilling
Frank Middleton  wrote:

> On 09/28/09 03:00 AM, Joerg Schilling wrote:
>
> > I am not sure whether my changes will be kept as wikipedia prefers to
> > keep badly quoted wrong information before correct information supplied by
> > people who have first hand information.
>
> They actually disallow "first hand information". Everything on Wikipedia
> is supposed to be confirmed by secondary or tertiary sources. That's why I

This is why wikipedia is wrong in mny cases :-(

> asked if there was any supporting documentation - papers, manuals,
> proceedings, whatever, that describe the introduction of tmpfs before
> 1990. If you were to write a personal page (in Wikipedia if you like)

IIR, there was a talk about tmpfs on the Sun User Group meeting around 
december 6th in San Jose in 1987. Maybe someone finds the proceedings.


> http://en.wikipedia.org/wiki/Wikipedia:Reliable_sources
>
> Wikipedia also has a lofi page (http://en.wikipedia.org/wiki/Lofi) that
> redirects to "loop mount". It has no historical section at all... There
> is no fbk (file system) page.

It is bad practise to advertize own projects on wikipedia.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Should usedbydataset be the same after zfs send/recv for a volume?

2009-09-28 Thread Albert Chin
On Mon, Sep 28, 2009 at 07:33:56PM -0500, Albert Chin wrote:
> When transferring a volume between servers, is it expected that the
> usedbydataset property should be the same on both? If not, is it cause
> for concern?
> 
> snv114# zfs list tww/opt/vms/images/vios/near.img
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> tww/opt/vms/images/vios/near.img  70.5G   939G  15.5G  -
> snv114# zfs get usedbydataset tww/opt/vms/images/vios/near.img
> NAME  PROPERTY   VALUE   SOURCE
> tww/opt/vms/images/vios/near.img  usedbydataset  15.5G   -
> 
> snv119# zfs list t/opt/vms/images/vios/near.img 
> NAME USED  AVAIL  REFER  MOUNTPOINT
> t/opt/vms/images/vios/near.img  14.5G  2.42T  14.5G  -
> snv119# zfs get usedbydataset t/opt/vms/images/vios/near.img 
> NAMEPROPERTY   VALUE   SOURCE
> t/opt/vms/images/vios/near.img  usedbydataset  14.5G   -

Don't know if it matters but disks on both send/recv server are
different, 300GB FCAL on the send, 750GB SATA on the recv.

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Should usedbydataset be the same after zfs send/recv for a volume?

2009-09-28 Thread Albert Chin
When transferring a volume between servers, is it expected that the
usedbydataset property should be the same on both? If not, is it cause
for concern?

snv114# zfs list tww/opt/vms/images/vios/near.img
NAME   USED  AVAIL  REFER  MOUNTPOINT
tww/opt/vms/images/vios/near.img  70.5G   939G  15.5G  -
snv114# zfs get usedbydataset tww/opt/vms/images/vios/near.img
NAME  PROPERTY   VALUE   SOURCE
tww/opt/vms/images/vios/near.img  usedbydataset  15.5G   -

snv119# zfs list t/opt/vms/images/vios/near.img 
NAME USED  AVAIL  REFER  MOUNTPOINT
t/opt/vms/images/vios/near.img  14.5G  2.42T  14.5G  -
snv119# zfs get usedbydataset t/opt/vms/images/vios/near.img 
NAMEPROPERTY   VALUE   SOURCE
t/opt/vms/images/vios/near.img  usedbydataset  14.5G   -

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which directories must be part of rpool?

2009-09-28 Thread Joerg Schilling
Chris Gerhard  wrote:

> TMPFS was not in the first release of 4.0. It was introduced to boost the 
> performance of diskless clients which no longer had the old network disk for 
> their root file systems and hence /tmp was now over NFS.

I did receive the SunOS-4.0 sources for my master thesis (a copy 
on write WORM filesystem) and this source did contain tmpfs.


Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)

2009-09-28 Thread Joerg Schilling
Darren J Moffat  wrote:

> Joerg Schilling wrote:
> > 
> > Just to prove my information: I invented "fbk" (which Sun now calls "lofi")
>
> Sun does NOT call your fbk by the name lofi.  Lofi is a completely 
> different implementation of the same concept.

With this kind of driver the implementation coding trivial, it is the idea that
is important. So it does not matter that lofi is a reimplementation.


Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Richard Elling wrote:

	   In other words, I am concerned that people replace good 
data protection
	   practices with scrubs and expecting scrub to deliver better data 
protection

   (it won't).


Many people here would profoundly disagree with the above.  There is 
no substitute for good backups, but a periodic scrub helps validate 
that a later resilver would succeed.  A perioic scrub also helps find 
system problems early when they are less likely to crater your 
business.  It is much better to find an issue during a scrub rather 
than during resilver of a mirror or raidz.


Scrubs are also useful for detecting broken hardware. However, 
normal activity will also detect broken hardware, so it is better to 
think of scrubs as finding degradation of old data rather than being 
a hardware checking service.


Do you have a scientific reference for this notion that "old data" is 
more likely to be corrupt than "new data" or is it just a gut-feeling? 
This hypothesis does not sound very supportable to me.  Magnetic 
hysteresis lasts quite a lot longer than the recommended service life 
for a hard drive.  Studio audio tapes from the '60s are still being 
used to produce modern "remasters" of old audio recordings which sound 
better than they ever did before (other than the master tape).  Some 
forms of magnetic hysteresis are known to last millions of years. 
Media failure is more often than not mechanical or chemical and not 
related to loss of magnetic hysteresis.  Head failures may be 
construed to be media failures.


See http://en.wikipedia.org/wiki/Ferromagnetic for information on 
ferromagnetic materials.


It would be most useful if zfs incorporated a slow-scan scrub which 
validates data at a low rate of speed which does not hinder active 
I/O.  Of course this is not a "green" energy efficient solution.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] refreservation not transferred by zfs send when sending a volume?

2009-09-28 Thread Victor Latushkin

On 29.09.09 03:58, Albert Chin wrote:

snv114# zfs get 
used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation 
tww/opt/vms/images/vios/mello-0.img
NAME PROPERTY  VALUE  SOURCE
tww/opt/vms/images/vios/mello-0.img  used  30.6G  -
tww/opt/vms/images/vios/mello-0.img  reservation   none   
default
tww/opt/vms/images/vios/mello-0.img  volsize   25G-
tww/opt/vms/images/vios/mello-0.img  refreservation25Glocal
tww/opt/vms/images/vios/mello-0.img  usedbydataset 5.62G  -
tww/opt/vms/images/vios/mello-0.img  usedbyrefreservation  25G-

Sent tww/opt/vms/images/vios/mello-0.img from snv_114 server
to snv_119 server.

On snv_119 server:
snv119# zfs get used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation t/opt/vms/images/vios/mello-0.img 
NAME   PROPERTY  VALUE  SOURCE

t/opt/vms/images/vios/mello-0.img  used  5.32G  -
t/opt/vms/images/vios/mello-0.img  reservation   none   default
t/opt/vms/images/vios/mello-0.img  volsize   25G-
t/opt/vms/images/vios/mello-0.img  refreservationnone   default
t/opt/vms/images/vios/mello-0.img  usedbydataset 5.32G  -
t/opt/vms/images/vios/mello-0.img  usedbyrefreservation  0  -

Any reason the refreservation and usedbyrefreservation properties are
not sent?


6853862 refquota property not send over with zfs send -R

also affects refreservation, fixed in snv_121
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] refreservation not transferred by zfs send when sending a volume?

2009-09-28 Thread Chris Kirby

On Sep 28, 2009, at 6:58 PM, Albert Chin wrote:


Any reason the refreservation and usedbyrefreservation properties are
not sent?


I believe this was CR 6853862, fixed in snv_121.

-Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Victor Latushkin

On 28.09.09 22:01, Richard Elling wrote:

On Sep 28, 2009, at 10:31 AM, Victor Latushkin wrote:


Richard Elling wrote:

On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:

On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Elling wrote:


Scrub could be faster, but you can try
   tar cf - . > /dev/null

If you think about it, validating checksums requires reading the 
data.

So you simply need to read the data.


This should work but it does not verify the redundant metadata.  For
example, the duplicate metadata copy might be corrupt but the problem
is not detected since it did not happen to be used.


Too bad we cannot scrub a dataset/object.

Can you provide a use case? I don't see why scrub couldn't start and
stop at specific txgs for instance. That won't necessarily get you to a
specific file, though.


With ever increasing disk and pool sizes it takes more and more time 
for scrub to complete its job. Let's imagine that you have 100TB pool 
with 90TB of data in it, and there's dataset with 10TB that is 
critical and another dataset with 80TB that is not that critical and 
you can afford loosing some blocks/files there.


Personally, I have three concerns here.
1. Gratuitous complexity, especially inside a pool -- aka creeping 
featurism


There's the idea of priority-based resilvering (though not implemented yet, see 
http://blogs.sun.com/bonwick/en_US/entry/smokin_mirrors) that can be simply 
extended to scrubs as well.


2. Wouldn't a better practice be to use two pools with different 
protection
   policies? The only protection policy differences inside a pool 
are copies.
   In other words, I am concerned that people replace good data 
protection
   practices with scrubs and expecting scrub to deliver better data 
protection

   (it won't).


It may be better, it may be not... With two pools you split you bandwidth and 
IOPS and space and have more entities to care about...


3. Since the pool contains the set of blocks, shared by datasets, it 
is not clear
   to me that scrubbing a dataset will detect all of the data 
corruption failures
   which can affect the dataset.  I'm thinking along the lines of 
phantom writes,

   for example.


That is why it may be useful to always scrub pool-wide metadata or have a way to 
specifically request it.



4. the time it takes to scrub lots of stuff
...there are four concerns... :-)

For magnetic media, a yearly scrub interval should suffice for most 
folks.  I know

some folks who scrub monthly. More frequent scrubs won't buy much.


It won't buy you much in term of magnetic media decay discovery. Unfortunately, 
there other sources of corruption as well (including phantom writes you are 
thinking about), and being able to discover corruption and recover it as quickly 
as possible from the backup it a good thing.



Scrubs are also useful for detecting broken hardware. However, normal
activity will also detect broken hardware, so it is better to think of 
scrubs as
finding degradation of old data rather than being a hardware checking 
service.



So being able to scrub individual dataset would help to run scrubs of 
critical data more frequently and faster and schedule scrubs for less 
frequently used and/or less important data to happen much less 
frequently.


It may be useful to have a way to tell ZFS to scrub pool-wide metadata 
only (space maps etc), so that you can build your own schedule of scrubs.


Another interesting idea is to be able to scrub only blocks modified 
since last snapshot.


This can be relatively easy to implement. But remember that scrubs are most
useful for finding data which has degraded from the media. In other 
words, old
data. New data is not likely to have degraded yet, and since ZFS is COW, 
all of

the new data is, well, new.




This is why having the ability to bound the start and end of a scrub by txg
can be easy and perhaps useful.


This requires exporting concept of the transaction group numbers to the user and 
i do not see how it is less complex from the user interface perspective than 
being able to request scrub of individual dataset, pool-wide metadata or 
newly-written data.


regards,
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] refreservation not transferred by zfs send when sending a volume?

2009-09-28 Thread Albert Chin
snv114# zfs get 
used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation 
tww/opt/vms/images/vios/mello-0.img
NAME PROPERTY  VALUE  SOURCE
tww/opt/vms/images/vios/mello-0.img  used  30.6G  -
tww/opt/vms/images/vios/mello-0.img  reservation   none   
default
tww/opt/vms/images/vios/mello-0.img  volsize   25G-
tww/opt/vms/images/vios/mello-0.img  refreservation25Glocal
tww/opt/vms/images/vios/mello-0.img  usedbydataset 5.62G  -
tww/opt/vms/images/vios/mello-0.img  usedbyrefreservation  25G-

Sent tww/opt/vms/images/vios/mello-0.img from snv_114 server
to snv_119 server.

On snv_119 server:
snv119# zfs get 
used,reservation,volsize,refreservation,usedbydataset,usedbyrefreservation 
t/opt/vms/images/vios/mello-0.img 
NAME   PROPERTY  VALUE  SOURCE
t/opt/vms/images/vios/mello-0.img  used  5.32G  -
t/opt/vms/images/vios/mello-0.img  reservation   none   default
t/opt/vms/images/vios/mello-0.img  volsize   25G-
t/opt/vms/images/vios/mello-0.img  refreservationnone   default
t/opt/vms/images/vios/mello-0.img  usedbydataset 5.32G  -
t/opt/vms/images/vios/mello-0.img  usedbyrefreservation  0  -

Any reason the refreservation and usedbyrefreservation properties are
not sent?

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OS install question

2009-09-28 Thread Frank Middleton

On 09/28/09 01:22 PM, David Dyer-Bennet wrote:


That seems truly bizarre.  Virtualbox recommends 16GB, and after doing an
install there's about 12GB free.


There's no way Solaris will install in 4GB if I understand what
you are saying. Maybe fresh off a CD when it doesn't have to
download a copy first, but the reality is 16GB is not possible
unless you don't want ever to to an image update. What version
are you running? Have you ever  tried pkg image-update?

# uname -a
SunOS host8 5.11 snv_111b i86pc i386 i86pc Solaris
# df -h
Filesystem   Size  Used Avail Use% Mounted on
rpool/ROOT/opensolaris-2  34G   13G   22G  37% /


# du -sh /var/pkg/download/
762M/var/pkg/download/

this after deleting all old BEs and all snapshots but not emptying
/var/pkg/download; swap/boot are on different slices.

SPARC is similar; snv122 takes 11Gb after deleting old BEs, all
snapshots, *and* /var/pkg/downloads; *without* /opt, swap,
/var/crash, /var/dump, /var/tmp, /var/run and /export...

AFAIK It is absolutely impossible to do a pkg image-update (say)
from snv111b to snv122 without at least 9GB free (it says 8GB
in the documentation). If the baseline is 11GB, you need 20GB
for an install, and that leaves you zip to spare.

Obvious reasons include before and after snaps, download before
install, and total rollback capability. This is all going to cost
some space. I believe there is a CR about this, but IMO when
you can get 2TB of disk for $200 it's hard to complain. 32GB
of SSD is not unreasonable and 16GB simply won't hack it.

All the above is based on actual and sometimes painful experience.
You *really* don't want to run out of space during an update. You'll
almost certainly end up restoring your boot disk if you do and if
you don't, you'll never get back all the space. Been there, done
that...

Cheers -- Frank



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Richard Elling

On Sep 28, 2009, at 10:31 AM, Victor Latushkin wrote:


Richard Elling wrote:

On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:

On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Elling wrote:


Scrub could be faster, but you can try
   tar cf - . > /dev/null

If you think about it, validating checksums requires reading the  
data.

So you simply need to read the data.


This should work but it does not verify the redundant metadata.   
For
example, the duplicate metadata copy might be corrupt but the  
problem

is not detected since it did not happen to be used.


Too bad we cannot scrub a dataset/object.

Can you provide a use case? I don't see why scrub couldn't start and
stop at specific txgs for instance. That won't necessarily get you  
to a

specific file, though.


With ever increasing disk and pool sizes it takes more and more time  
for scrub to complete its job. Let's imagine that you have 100TB  
pool with 90TB of data in it, and there's dataset with 10TB that is  
critical and another dataset with 80TB that is not that critical and  
you can afford loosing some blocks/files there.


Personally, I have three concerns here.
	1. Gratuitous complexity, especially inside a pool -- aka creeping  
featurism
	2. Wouldn't a better practice be to use two pools with different  
protection
	   policies? The only protection policy differences inside a pool are  
copies.
	   In other words, I am concerned that people replace good data  
protection
	   practices with scrubs and expecting scrub to deliver better data  
protection

   (it won't).
	3. Since the pool contains the set of blocks, shared by datasets, it  
is not clear
	   to me that scrubbing a dataset will detect all of the data  
corruption failures
	   which can affect the dataset.  I'm thinking along the lines of  
phantom writes,

   for example.
4. the time it takes to scrub lots of stuff
...there are four concerns... :-)

For magnetic media, a yearly scrub interval should suffice for most  
folks.  I know

some folks who scrub monthly. More frequent scrubs won't buy much.

Scrubs are also useful for detecting broken hardware. However, normal
activity will also detect broken hardware, so it is better to think of  
scrubs as
finding degradation of old data rather than being a hardware checking  
service.



So being able to scrub individual dataset would help to run scrubs  
of critical data more frequently and faster and schedule scrubs for  
less frequently used and/or less important data to happen much less  
frequently.


It may be useful to have a way to tell ZFS to scrub pool-wide  
metadata only (space maps etc), so that you can build your own  
schedule of scrubs.


Another interesting idea is to be able to scrub only blocks modified  
since last snapshot.


This can be relatively easy to implement. But remember that scrubs are  
most
useful for finding data which has degraded from the media. In other  
words, old
data. New data is not likely to have degraded yet, and since ZFS is  
COW, all of
the new data is, well, new.  This is why having the ability to bound  
the start and

end of a scrub by txg can be easy and perhaps useful.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz failure, trying to recover

2009-09-28 Thread Victor Latushkin

Liam Slusser wrote:

Long story short, my cat jumped on my server at my house crashing two drives at 
the same time.  It was a 7 drive raidz (next time ill do raidz2).


Long story short - we've been able to get access to data in the pool. 
This involved finding better old state with the help of 'zdb -t', then 
verifying metadata checksums with 'zdb -eubbcsL', then extracting 
configuration from the pool, making cache file from the extracted 
configuration and finally importing pool (readonly at the moment) to 
back up data.


As soon as it is backed up, we'll try to do read-write import...

victor



The server crashed complaining about a drive failure, so i rebooted into single 
user mode not realizing that two drives failed.  I put in a new 500g 
replacement and had zfs start a replace operation which failed at about 2% 
because there was two broken drives.  From that point i turned off the computer 
and sent both drives to a data recovery place.  They were able to recover the 
data on one of the two drives (the one that i started the replace operation on) 
- great - that should be enough to get my data back.

I popped the newly recovered drive back in, it had an older tgx number then the 
other drives so i made a backup of each drive and then modified the tgx number 
to an earlier tgx number so they all match.

However i am still unable to mount the array - im getting the following error: 
(doesnt matter if i use -f or -F)

bash-3.2# zpool import data
  pool: data
id: 6962146434836213226
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:

data   UNAVAIL  missing device
  raidz1   DEGRADED
c0t0d0 ONLINE
c0t1d0 ONLINE
replacing  ONLINE
  c0t2d0   ONLINE
  c0t7d0   ONLINE
c0t3d0 UNAVAIL  cannot open
c0t4d0 ONLINE
c0t5d0 ONLINE
c0t6d0 ONLINE

Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.

Now i should have enough online devices to mount and get my data off however no 
luck.  I'm not really sure where to go at this point.

Do i have to fake a c0t3d0 drive so it thinks all drives are there?  Can 
somebody point me in the right direction?

thanks,
liam



p.s.  To help me find which uberblocks to modify to reset the tgx i wrote a 
little perl program which finds and prints out information in order to revert 
to an earlier tgx value.

Its a little messy since i wrote it super late at night quickly - but maybe it 
will help somebody else out.

http://liam821.com/findUberBlock.txt (its just a perl script)

Its easy to run.  It pulls in 256k of data and sorts it (or skipping X kbyte if 
you use the -s ###) and then searches for uberblocks.  (remember there is 4 
labels, 0 256, and then two at the end of the disk.  You need to manually 
figure out the end skip value...)  Calculating the GUID seems to always fail 
because the number is to large for perl so it returns a negative number.  meh 
wasnt important enough to try to figure out.

(the info below has NOTHING to do with my disk problem above, its a happy and 
health server that i wrote the tool on)

- find newest tgx number
bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -n
block=148 (0025000) transaction=15980419

- print verbose output
bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -n -v
block=148 (0025000)
zfs_ver=3   (0003   )
transaction=15980419(d783 00f3  )
guid_sum=-14861410676147539 (7aad 2fc9 33a0 ffcb)
timestamp=1253958103(e1d7 4abd  )
(Sat Sep 26 02:41:43 2009)

raw =   0025000 b10c 00ba   0003   
0025010 d783 00f3   7aad 2fc9 33a0 ffcb
0025020 e1d7 4abd   0001   

- list all uberblocks
bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -l
block=145 (0024400) transaction=15980288
block=146 (0024800) transaction=15980289
block=147 (0024c00) transaction=15980290
block=148 (0025000) transaction=15980291
block=149 (0025400) transaction=15980292
block=150 (0025800) transaction=15980293
block=151 (0025c00) transaction=15980294
block=152 (0026000) transaction=15980295
block=153 (0026400) transaction=15980296
block=154 (0026800) transaction=15980297
block=155 (0026c00) transaction=15980298
block=156 (0027000) transaction=15980299
block=157 (0027400) transaction=15980300
block=158 (0027800) transaction=15980301
.
.
.

- skip to 256 into the disk and find the newest uberblock
bash-3.00# /tmp/findUberBlock /dev/dsk/c0t1d0 -n -s 256
block=507 (7ec00) transaction=15980522

Now lets say i want to go back in time on this, using the program can help me 
do that.  If i wanted 

Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Victor Latushkin

Richard Elling wrote:

On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:


On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Elling wrote:


Scrub could be faster, but you can try
tar cf - . > /dev/null

If you think about it, validating checksums requires reading the data.
So you simply need to read the data.


This should work but it does not verify the redundant metadata.  For
example, the duplicate metadata copy might be corrupt but the problem
is not detected since it did not happen to be used.


Too bad we cannot scrub a dataset/object.


Can you provide a use case? I don't see why scrub couldn't start and
stop at specific txgs for instance. That won't necessarily get you to a
specific file, though.


With ever increasing disk and pool sizes it takes more and more time for 
scrub to complete its job. Let's imagine that you have 100TB pool with 
90TB of data in it, and there's dataset with 10TB that is critical and 
another dataset with 80TB that is not that critical and you can afford 
loosing some blocks/files there.


So being able to scrub individual dataset would help to run scrubs of 
critical data more frequently and faster and schedule scrubs for less 
frequently used and/or less important data to happen much less frequently.


It may be useful to have a way to tell ZFS to scrub pool-wide metadata 
only (space maps etc), so that you can build your own schedule of scrubs.


Another interesting idea is to be able to scrub only blocks modified 
since last snapshot.


victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Albert Chin
On Mon, Sep 28, 2009 at 10:16:20AM -0700, Richard Elling wrote:
> On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:
>
>> On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:
>>> On Mon, 28 Sep 2009, Richard Elling wrote:

 Scrub could be faster, but you can try
tar cf - . > /dev/null

 If you think about it, validating checksums requires reading the  
 data.
 So you simply need to read the data.
>>>
>>> This should work but it does not verify the redundant metadata.  For
>>> example, the duplicate metadata copy might be corrupt but the problem
>>> is not detected since it did not happen to be used.
>>
>> Too bad we cannot scrub a dataset/object.
>
> Can you provide a use case? I don't see why scrub couldn't start and
> stop at specific txgs for instance. That won't necessarily get you to a
> specific file, though.

If your pool is borked but mostly readable, yet some file systems have
cksum errors, you cannot "zfs send" that file system (err, snapshot of
filesystem). So, you need to manually fix the file system by traversing
it to read all files to determine which must be fixed. Once this is
done, you can snapshot and "zfs send". If you have many file systems,
this is time consuming.

Of course, you could just rsync and be happy with what you were able to
recover, but if you have clones branched from the same parent, which a
few differences inbetween shapshots, having to rsync *everything* rather
than just the differences is painful. Hence the reason to try to get
"zfs send" to work.

But, this is an extreme example and I doubt pools are often in this
state so the engineering time isn't worth it. In such cases though, a
"zfs scrub" would be useful.

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Bob Friesenhahn wrote:


This should work but it does not verify the redundant metadata.  For example, 
the duplicate metadata copy might be corrupt but the problem is not detected 
since it did not happen to be used.


I am finding that your tar incantation is reading hardly any data from 
disk when testing my home directory and the 'tar' happens to be GNU 
tar:


# time tar cf - . > /dev/null
tar cf - . > /dev/null  2.72s user 12.43s system 96% cpu 15.721 total
# du -sh .
82G

Looks like the GNU folks slipped in a small performance "enhancement" 
if the output is to /dev/null.


Make sure to use /bin/tar, which seems to actually read the data.

When actually reading the data via tar, read performance is very poor. 
Hopefully I will have a ZFS IDR to test with in the next few days 
which fixes the prefetch bug.


Zpool scrub reads the data at 360MB/second but this tar method is only 
reading at an average of 6MB/second to 42MB/second (according to zpool 
iostat).  Wups, I just saw a one-minute average of 105MB and then 
131MB.  Quite variable.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OS install question

2009-09-28 Thread David Dyer-Bennet

On Mon, September 28, 2009 07:56, Frank Middleton wrote:
> On 09/28/09 12:40 AM, Ron Watkins wrote:
>>
>> Thus, im at a loss as to how to get the root pool setup as a 20Gb
>> slice
>
> 20GB is too small. You'll be fighting for space every time
> you use pkg. From my considerable experience installing to a
> 20GB mirrored rpool, I would go for 32GB if you can.

That seems truly bizarre.  Virtualbox recommends 16GB, and after doing an
install there's about 12GB free.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Tim Cook
On Mon, Sep 28, 2009 at 12:16 PM, Richard Elling
wrote:

> On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:
>
>  On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:
>>
>>> On Mon, 28 Sep 2009, Richard Elling wrote:
>>>

 Scrub could be faster, but you can try
tar cf - . > /dev/null

 If you think about it, validating checksums requires reading the data.
 So you simply need to read the data.

>>>
>>> This should work but it does not verify the redundant metadata.  For
>>> example, the duplicate metadata copy might be corrupt but the problem
>>> is not detected since it did not happen to be used.
>>>
>>
>> Too bad we cannot scrub a dataset/object.
>>
>
> Can you provide a use case? I don't see why scrub couldn't start and
> stop at specific txgs for instance. That won't necessarily get you to a
> specific file, though.
>  -- richard
>
>
> On Mon, Sep 28, 2009 at 12:16 PM, Richard Elling  wrote:

> On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:
>
>  On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:
>>
>>> On Mon, 28 Sep 2009, Richard Elling wrote:
>>>

 Scrub could be faster, but you can try
tar cf - . > /dev/null

 If you think about it, validating checksums requires reading the data.
 So you simply need to read the data.

>>>
>>> This should work but it does not verify the redundant metadata.  For
>>> example, the duplicate metadata copy might be corrupt but the problem
>>> is not detected since it did not happen to be used.
>>>
>>
>> Too bad we cannot scrub a dataset/object.
>>
>
> Can you provide a use case? I don't see why scrub couldn't start and
> stop at specific txgs for instance. That won't necessarily get you to a
> specific file, though.
>  -- richard
>

I get the impression he just wants to check a single file in a pool without
waiting for it to check the entire pool.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Richard Elling

On Sep 28, 2009, at 3:42 PM, Albert Chin wrote:


On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:

On Mon, 28 Sep 2009, Richard Elling wrote:


Scrub could be faster, but you can try
tar cf - . > /dev/null

If you think about it, validating checksums requires reading the  
data.

So you simply need to read the data.


This should work but it does not verify the redundant metadata.  For
example, the duplicate metadata copy might be corrupt but the problem
is not detected since it did not happen to be used.


Too bad we cannot scrub a dataset/object.


Can you provide a use case? I don't see why scrub couldn't start and
stop at specific txgs for instance. That won't necessarily get you to a
specific file, though.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Albert Chin
On Mon, Sep 28, 2009 at 12:09:03PM -0500, Bob Friesenhahn wrote:
> On Mon, 28 Sep 2009, Richard Elling wrote:
>>
>> Scrub could be faster, but you can try
>>  tar cf - . > /dev/null
>>
>> If you think about it, validating checksums requires reading the data.
>> So you simply need to read the data.
>
> This should work but it does not verify the redundant metadata.  For
> example, the duplicate metadata copy might be corrupt but the problem
> is not detected since it did not happen to be used.

Too bad we cannot scrub a dataset/object.

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Victor Latushkin

Paul,

Thanks for additional data, please see comments inline.

Paul Archer wrote:

7:56pm, Victor Latushkin wrote:


While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new 
question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't 
do a 'zpool replace' because the zpool isn't online.


ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in 
case it controls entire disk. As before upgrade it looked like this:


   NAMESTATE READ WRITE CKSUM
   datapoolONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c2d0s0  ONLINE   0 0 0
   c3d0s0  ONLINE   0 0 0
   c4d0s0  ONLINE   0 0 0
   c6d0s0  ONLINE   0 0 0
   c5d0s0  ONLINE   0 0 0

I guess something happened to the labeling of disk c7d0 (used to be 
c2d0) before, during or after upgrade.


It would be nice to show what zdb -l shows for this disk and some 
other disk too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too.




This is from c7d0:


LABEL 0

version=13
name='datapool'
state=0
txg=233478
pool_guid=3410059226836265661
hostid=519305
hostname='shebop'
top_guid=7679950824008134671
guid=17458733222130700355
vdev_tree
type='raidz'
id=0
guid=7679950824008134671
nparity=1
metaslab_array=23
metaslab_shift=32
ashift=9
asize=7501485178880
is_log=0
children[0]
type='disk'
id=0
guid=17458733222130700355
path='/dev/dsk/c7d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742049/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@0/i...@1/c...@0,0:a'
whole_disk=1


This is why ZFS does not show s0 in the zpool output for c7d0 - it 
controls entire disk. I guess initially it was the other way - it is 
unlikely that you specified disks differently at creation time and 
earlier output suggests that it was other way. So somthing happened 
before last system reboot that most likely relabeled your c7d0 disk, and 
configuration in the labels was updated.



DTL=588
children[1]
type='disk'
id=1
guid=4735756507338772729
path='/dev/dsk/c8d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742050/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@0/c...@0,0:a'
whole_disk=0


All the other disks have whole_disk=0, so there's s0 in the zpool output 
for those disks.




DTL=467
children[2]
type='disk'
id=2
guid=10113358996255761229
path='/dev/dsk/c9d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742059/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@1/c...@0,0:a'
whole_disk=0
DTL=573
children[3]
type='disk'
id=3
guid=11460855531791764612
path='/dev/dsk/c11d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742048/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@1/c...@0,0:a'
whole_disk=0
DTL=571
children[4]
type='disk'
id=4
guid=14986691153111294171
path='/dev/dsk/c10d0s0'
devid='id1,c...@ast31500341as=9vs0ttwf/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@0/c...@0,0:a'
whole_disk=0
DTL=473


Labels 1-3 are identical

The other disks in the pool give identical results (except for the 
guid's, which match with what's above).


Ok, then let's look at the vtoc - probably we can find something 
interesting there.



c8d0 - c11d0 are identical, so I didn't include that output below:


This is expected. So let's look for the differences:



r...@shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0
* /dev/rdsk/c7d0s0 partition map
*
* Unallocated space:
*   First SectorLast
*   Sector CountSector
*  34   222   255
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400256 2930247391 2930247646
   8 1100  2930247647 16384 2930264030
r...@shebop:/tmp#
r...@shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0
* /dev/rdsk/c8d0s0 partition map
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0 1700 34 2930277101 2930277134




Now you can clearly see the difference between the two: 4 disks have 
only one 

Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Bob Friesenhahn

On Mon, 28 Sep 2009, Richard Elling wrote:


Scrub could be faster, but you can try
tar cf - . > /dev/null

If you think about it, validating checksums requires reading the data.
So you simply need to read the data.


This should work but it does not verify the redundant metadata.  For 
example, the duplicate metadata copy might be corrupt but the problem 
is not detected since it did not happen to be used.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Richard Elling

On Sep 28, 2009, at 2:41 PM, Albert Chin wrote:

Without doing a zpool scrub, what's the quickest way to find files  
in a
filesystem with cksum errors? Iterating over all files with "find"  
takes

quite a bit of time. Maybe there's some zdb fu that will perform the
check for me?


Scrub could be faster, but you can try
tar cf - . > /dev/null

If you think about it, validating checksums requires reading the data.
So you simply need to read the data.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ARC vs Oracle cache

2009-09-28 Thread Glenn Fawcett
Been there, done that, got the tee shirt  A larger SGA will *always* 
be more efficient at servicing Oracle requests for blocks.  You avoid 
going through all the IO code of Oracle and it simply reduces to a hash.


http://blogs.sun.com/glennf/entry/where_do_you_cache_oracle

al...@sun wrote:

Hi all,

There is no generic response for:
Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small 
ZFS ARC"?


We can awser:
Have a large enough SGA do get good cache hit ratio (higher than 90 % 
for OLTP).
Have some GB ZFS arc (Not less than 500M, usually more than 16GB is 
not usefull).


Then you have to tune. We know that ZFS cache help the database reads.
The cache strategies of ZFS and Oracle are different, and usually they 
help each other.


The is no reason to avoid to cache the same data twice.
Exemple:
Oracle query ask for a range scan on index. ZFS detect sequential 
reads and
start to prefetch the data. ZFS try to cache the data that Oracle will 
probably ask next.

When Oracle ask, the data is cache twice.
All the cache are dynamics.


The best knowned record size for an OLTP environment is :
Dataset Recordsize
Table Data 8K (db_block_size)
Redo Logs 128K
Index 8K (db_block_size)
Undo
128K
Temp
128K


We still recommand a distinct zpool for redologs.

Regards.

Alain Chéreau


Enda O'Connor a écrit :

Richard Elling wrote:

On Sep 24, 2009, at 10:30 AM, Javier Conde wrote:


Hello,

Given the following configuration:

* Server with 12 SPARCVII CPUs  and 96 GB of RAM
* ZFS used as file system for Oracle data
* Oracle 10.2.0.4 with 1.7TB of data and indexes
* 1800 concurrents users with PeopleSoft Financial
* 2 PeopleSoft transactions per day
* HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), 
total 48 disks

* 2x 4Gbps FC with MPxIO

Which is the best Oracle SGA size to avoid cache duplication 
between Oracle and ZFS?


Is it better to have a "small SGA + big ZFS ARC" or "large SGA + 
small ZFS ARC"?


Who does a better cache for overall performance?


In general, it is better to cache closer to the consumer (application).

You don't mention what version of Solaris or ZFS you are using.
For later versions, the primarycache property allows you to control the
ARC usage on a per-dataset basis.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Hi
addign oracle-interest
I would suggest some testing but standard recommendation to start with
are keep zfs record size is db block size, keep oracle log writer to 
it's own pool ( 128k recordsize is recommended I believe for this one 
), the log writer is a io limiting factor as such , use latest Ku's 
for solaris as they contain some critical fixes for zfs/oracle, ie 
6775697 for instance.  Small SGA is not usually recommended, but of 
course a lot depends on application layer as well, I can only say 
test with the recommendations above and then deviate from there, 
perhaps keeping zil on separate high latency device might help ( 
again only analysis can determine all that ). Then remember that even 
after that with a large SGA etc, sometimes perf can degrade, ie might 
need to instruct oracle to actually cache, via alter table cache 
command etc.


getting familiar with statspack aws will be a must here :-) as only 
an analysis of Oracle from an oracle point of view can really tell 
what is workign as such.


Enda



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ARC vs Oracle cache

2009-09-28 Thread al...@sun

Hi all,

There is no generic response for:
Is it better to have a "small SGA + big ZFS ARC" or "large SGA + small 
ZFS ARC"?


We can awser:
Have a large enough SGA do get good cache hit ratio (higher than 90 % 
for OLTP).
Have some GB ZFS arc (Not less than 500M, usually more than 16GB is not 
usefull).


Then you have to tune. We know that ZFS cache help the database reads.
The cache strategies of ZFS and Oracle are different, and usually they 
help each other.


The is no reason to avoid to cache the same data twice.
Exemple:
Oracle query ask for a range scan on index. ZFS detect sequential reads and
start to prefetch the data. ZFS try to cache the data that Oracle will 
probably ask next.

When Oracle ask, the data is cache twice.
All the cache are dynamics.


The best knowned record size for an OLTP environment is :
Dataset Recordsize
Table Data  8K (db_block_size)
Redo Logs   128K
Index   8K (db_block_size)
Undo
128K
Temp
128K


We still recommand a distinct zpool for redologs.

Regards.

Alain Chéreau


Enda O'Connor a écrit :

Richard Elling wrote:

On Sep 24, 2009, at 10:30 AM, Javier Conde wrote:


Hello,

Given the following configuration:

* Server with 12 SPARCVII CPUs  and 96 GB of RAM
* ZFS used as file system for Oracle data
* Oracle 10.2.0.4 with 1.7TB of data and indexes
* 1800 concurrents users with PeopleSoft Financial
* 2 PeopleSoft transactions per day
* HDS USP1100 with LUNs stripped on 6 parity groups (450xRAID7+1), 
total 48 disks

* 2x 4Gbps FC with MPxIO

Which is the best Oracle SGA size to avoid cache duplication between 
Oracle and ZFS?


Is it better to have a "small SGA + big ZFS ARC" or "large SGA + 
small ZFS ARC"?


Who does a better cache for overall performance?


In general, it is better to cache closer to the consumer (application).

You don't mention what version of Solaris or ZFS you are using.
For later versions, the primarycache property allows you to control the
ARC usage on a per-dataset basis.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Hi
addign oracle-interest
I would suggest some testing but standard recommendation to start with
are keep zfs record size is db block size, keep oracle log writer to 
it's own pool ( 128k recordsize is recommended I believe for this one 
), the log writer is a io limiting factor as such , use latest Ku's 
for solaris as they contain some critical fixes for zfs/oracle, ie 
6775697 for instance.  Small SGA is not usually recommended, but of 
course a lot depends on application layer as well, I can only say test 
with the recommendations above and then deviate from there, perhaps 
keeping zil on separate high latency device might help ( again only 
analysis can determine all that ). Then remember that even after that 
with a large SGA etc, sometimes perf can degrade, ie might need to 
instruct oracle to actually cache, via alter table cache command etc.


getting familiar with statspack aws will be a must here :-) as only an 
analysis of Oracle from an oracle point of view can really tell what 
is workign as such.


Enda



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

7:56pm, Victor Latushkin wrote:


While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: 
how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool 
replace' because the zpool isn't online.


ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it 
controls entire disk. As before upgrade it looked like this:


   NAMESTATE READ WRITE CKSUM
   datapoolONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c2d0s0  ONLINE   0 0 0
   c3d0s0  ONLINE   0 0 0
   c4d0s0  ONLINE   0 0 0
   c6d0s0  ONLINE   0 0 0
   c5d0s0  ONLINE   0 0 0

I guess something happened to the labeling of disk c7d0 (used to be c2d0) 
before, during or after upgrade.


It would be nice to show what zdb -l shows for this disk and some other disk 
too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too.




This is from c7d0:


LABEL 0

version=13
name='datapool'
state=0
txg=233478
pool_guid=3410059226836265661
hostid=519305
hostname='shebop'
top_guid=7679950824008134671
guid=17458733222130700355
vdev_tree
type='raidz'
id=0
guid=7679950824008134671
nparity=1
metaslab_array=23
metaslab_shift=32
ashift=9
asize=7501485178880
is_log=0
children[0]
type='disk'
id=0
guid=17458733222130700355
path='/dev/dsk/c7d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742049/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@0/i...@1/c...@0,0:a'
whole_disk=1
DTL=588
children[1]
type='disk'
id=1
guid=4735756507338772729
path='/dev/dsk/c8d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742050/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@0/c...@0,0:a'
whole_disk=0
DTL=467
children[2]
type='disk'
id=2
guid=10113358996255761229
path='/dev/dsk/c9d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742059/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@1/c...@0,0:a'
whole_disk=0
DTL=573
children[3]
type='disk'
id=3
guid=11460855531791764612
path='/dev/dsk/c11d0s0'
devid='id1,c...@asamsung_hd154ui=s1y6j1ks742048/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@1/c...@0,0:a'
whole_disk=0
DTL=571
children[4]
type='disk'
id=4
guid=14986691153111294171
path='/dev/dsk/c10d0s0'
devid='id1,c...@ast31500341as=9vs0ttwf/a'

phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@0/c...@0,0:a'
whole_disk=0
DTL=473


Labels 1-3 are identical

The other disks in the pool give identical results (except for the guid's, 
which match with what's above).




c8d0 - c11d0 are identical, so I didn't include that output below:

r...@shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0
* /dev/rdsk/c7d0s0 partition map
*
* Dimensions:
* 512 bytes/sector
* 2930264064 sectors
* 2930263997 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*   First SectorLast
*   Sector CountSector
*  34   222   255
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0  400256 2930247391 2930247646
   8 1100  2930247647 16384 2930264030
r...@shebop:/tmp#
r...@shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0
* /dev/rdsk/c8d0s0 partition map
*
* Dimensions:
* 512 bytes/sector
* 2930264064 sectors
* 2930277101 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
*  First SectorLast
* Partition  Tag  FlagsSector CountSector  Mount Directory
   0 1700 34 2930277101 2930277134


 Thanks for the help!


Paul Archer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Quickest way to find files with cksum errors without doing scrub

2009-09-28 Thread Albert Chin
Without doing a zpool scrub, what's the quickest way to find files in a
filesystem with cksum errors? Iterating over all files with "find" takes
quite a bit of time. Maybe there's some zdb fu that will perform the
check for me?

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Victor Latushkin

On 28.09.09 18:09, Paul Archer wrote:

8:30am, Paul Archer wrote:


And the hits just keep coming...
The resilver finished last night, so rebooted the box as I had just 
upgraded to the latest Dev build. Not only did the upgrade fail (love 
that instant rollback!), but now the zpool won't come online:


r...@shebop:~# zpool import
 pool: datapool
   id: 3410059226836265661
state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

   datapool UNAVAIL  insufficient replicas
 raidz1 UNAVAIL  corrupted data
   c7d0 ONLINE
   c8d0s0   ONLINE
   c9d0s0   ONLINE
   c11d0s0  ONLINE
   c10d0s0  ONLINE


I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy.
Is it OK to scream and tear my hair out now?



A little more research came up with this:

r...@shebop:~# zdb -l /dev/dsk/c7d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3


While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question 
is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool 
replace' because the zpool isn't online.


ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it 
controls entire disk. As before upgrade it looked like this:


NAMESTATE READ WRITE CKSUM
datapoolONLINE   0 0 0
  raidz1ONLINE   0 0 0
c2d0s0  ONLINE   0 0 0
c3d0s0  ONLINE   0 0 0
c4d0s0  ONLINE   0 0 0
c6d0s0  ONLINE   0 0 0
c5d0s0  ONLINE   0 0 0

I guess something happened to the labeling of disk c7d0 (used to be c2d0) 
before, during or after upgrade.


It would be nice to show what zdb -l shows for this disk and some other disk 
too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too.


victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OS install question

2009-09-28 Thread Cindy Swearingen

Hi Ron,

Any reason why you want to use slices except for the root pool?

I would recommend a 4-disk configuration like this:

mirrored root pool on c1t0d0s0 and c2t0d0s0
mirrored app pool on c1t1d0 and c2t1d0

Let the install use one big slice for each disk in the mirrored root
pool, which is required for booting and whole disks for the app pool.
Other than for the root pool, slices are not required.

In the future, you can attach/add more disks to the app pool and/or
replace with larger disks in either pool.

Any additional administration, such as trying to expand a slice (you
can't expand an existing slice under a live pool) or reconfiguration
is much easier without having to muck with slices.

Cindy
On 09/27/09 18:41, Ron Watkins wrote:

I have a box with 4 disks. It was my intent to place a mirrored root partition 
on 2 disks on different controllers, then use the remaining space and the other 
2 disks to create a raid-5 configuration from which to export iscsi luns for 
use by other hosts.
The problem im having is that when I try to install OS, it either takes the 
entire disk or a partition the same size as the entire disk. I tried creating 2 
slices, but the install won't allow it and if I make the solaris partition 
smaller, then the OS no longer sees the rest of the disk, only the small piece.
I found references on how to mirror the root disk pool, but the grub piece 
doesn't seem to work as when I disconnect the first disk all I get at reboot is 
a grub prompt.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

8:30am, Paul Archer wrote:


And the hits just keep coming...
The resilver finished last night, so rebooted the box as I had just upgraded 
to the latest Dev build. Not only did the upgrade fail (love that instant 
rollback!), but now the zpool won't come online:


r...@shebop:~# zpool import
 pool: datapool
   id: 3410059226836265661
state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

   datapool UNAVAIL  insufficient replicas
 raidz1 UNAVAIL  corrupted data
   c7d0 ONLINE
   c8d0s0   ONLINE
   c9d0s0   ONLINE
   c11d0s0  ONLINE
   c10d0s0  ONLINE


I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy.
Is it OK to scream and tear my hair out now?



A little more research came up with this:

r...@shebop:~# zdb -l /dev/dsk/c7d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3


While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question 
is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool 
replace' because the zpool isn't online.


Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Masafumi Ohta


On 2009/09/28, at 22:09, Jim Grisanzio wrote:


Jorgen Lundman wrote:
When I approach Sun-Japan directly I just get told that they don't  
speak
English.  When my Japanese colleagues approach Sun-Japan directly,  
it is

suggested to us that we stay with our current Vendor.


hey ...

I work at Sun Japan in the Yoga office. I can connect you with  
English speakers here. Contact me off list if you are interested.


Also, there are some Japan lists for OpenSolaris you may want to  
subscribe to: The Japan OpenSolaris User Group and The Tokyo  
OpenSolaris User Group. The Japan group is mostly Japanese, but we  
are trying to build an international group in English for the Tokyo  
OSUG. There are bi-lingual westerners and Japanese on both lists,  
and we have events in Yoga as well.


http://mail.opensolaris.org/mailman/listinfo/ug-tsug (English )
http://mail.opensolaris.org/mailman/listinfo/ug-jposug (Japanese)

Jim
--
http://blogs.sun.com/jimgris/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Hi,

I am a leader of Tokyo OpenSolaris User Group.as Jimgris says,we are  
now trying to build an international group like Tokyo2pointO and Tokyo  
Linux User Group(TLUG)
You can ask us in English in our tsug mailing list,especially issue in  
Japan - OpenSolaris support program in Japan.


We hope we would be your help.

Thanks,

Masafumi Ohta
a Leader of Tokyo OpenSolaris User Group
mailto:masafumi.o...@gmail.com
http://www.twitter.com/masafumi_ohta



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] extremely slow writes (with good reads)

2009-09-28 Thread Paul Archer

Yesterday, Paul Archer wrote:



I estimate another 10-15 hours before this disk is finished resilvering and 
the zpool is OK again. At that time, I'm going to switch some hardware out 
(I've got a newer and higher-end LSI card that I hadn't used before because 
it's PCI-X, and won't fit on my current motherboard.)
I'll report back what I get with it tomorrow or the next day, depending on 
the timing on the resilver.


Paul Archer


And the hits just keep coming...
The resilver finished last night, so rebooted the box as I had just 
upgraded to the latest Dev build. Not only did the upgrade fail (love that 
instant rollback!), but now the zpool won't come online:


r...@shebop:~# zpool import
  pool: datapool
id: 3410059226836265661
 state: UNAVAIL
status: The pool is formatted using an older on-disk version.
action: The pool cannot be imported due to damaged devices or data.
config:

datapool UNAVAIL  insufficient replicas
  raidz1 UNAVAIL  corrupted data
c7d0 ONLINE
c8d0s0   ONLINE
c9d0s0   ONLINE
c11d0s0  ONLINE
c10d0s0  ONLINE


I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy.
Is it OK to scream and tear my hair out now?

Paul

PS I don't suppose there's an RFE out there for "give useful data when a 
pool is unavailable." Or even better, "allow a pool to be imported (but no 
filesystems mounted) so it *can be fixed*."

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hungs up forever...

2009-09-28 Thread Victor Latushkin

On 29.07.09 15:18, Markus Kovero wrote:

I recently noticed that importing larger pools that are occupied by large
amounts of data can do zpool import for several hours while zpool iostat only
showing some random reads now and then and iostat -xen showing quite busy disk
usage, It's almost it goes thru every bit in pool before it goes thru.

Somebody said that zpool import got faster on snv118, but I don't have real
information on that yet.


This had nothing to do with speed of 'zpool import'. There was corrupted 
pool-wide metadata block that prevented pool from importing successfully.


Fortunately enough, we found better previous state a few txgs back with txg 
2683802 (last synced was 2682802:


#zdb -e -ubbcsL -t 2682802 data1
...
4.25K  19.9M   8.62M   25.8M   6.08K2.31 0.00  SPA space map
1   128K128K128K128K1.00 0.00  ZIL intent log
1.77K  28.4M   8.48M   17.3M9.8K3.35 0.00  DMU dnode
2 2K  1K   2.50K   1.25K2.00 0.00  DMU objset
-  -   -   -   -   --  DSL directory
2 1K  1K   3.00K   1.50K1.00 0.00  DSL directory child map
1512 512   1.50K   1.50K1.00 0.00  DSL dataset snap map
2 1K  1K   3.00K   1.50K1.00 0.00  DSL props
-  -   -   -   -   --  DSL dataset
-  -   -   -   -   --  ZFS znode
-  -   -   -   -   --  ZFS V0 ACL
46.3M  5.74T   5.74T   5.74T127K1.00   100.00  ZFS plain file
1.87K  9.04M   2.75M   5.50M   2.94K3.29 0.00  ZFS directory
1512 512  1K  1K1.00 0.00  ZFS master node
1512 512  1K  1K1.00 0.00  ZFS delete queue
-  -   -   -   -   --  zvol object
-  -   -   -   -   --  zvol prop
-  -   -   -   -   --  other uint8[]
-  -   -   -   -   --  other uint64[]
-  -   -   -   -   --  other ZAP
-  -   -   -   -   --  persistent error log
1   128K   4.50K   13.5K   13.5K   28.44 0.00  SPA history
-  -   -   -   -   --  SPA history offsets
-  -   -   -   -   --  Pool properties
-  -   -   -   -   --  DSL permissions
-  -   -   -   -   --  ZFS ACL
-  -   -   -   -   --  ZFS SYSACL
-  -   -   -   -   --  FUID table
-  -   -   -   -   --  FUID table size
-  -   -   -   -   --  DSL dataset next clones
-  -   -   -   -   --  scrub work queue
46.3M  5.74T   5.74T   5.74T127K1.00   100.00  Total

   capacity   operations   bandwidth   errors 
descriptionused avail  read write  read write  read write cksum
data1 5.74T 6.99T   523 0 65.1M 0 0 0 1
 /dev/dsk/c14t0d05.74T 6.99T   523 0 65.1M 0 0 017

So we reactivated it and were able to import pool just fine. Subsequent scrub 
did find couple of errors in metadata. There were no user data error at all:


# zpool status -v data1
   pool: data1
  state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: scrub completed after 12h43m with 0 errors on Thu Aug  6 06:00:11 2009
config:

 NAMESTATE READ WRITE CKSUM
 data1   ONLINE   0 0 0
   c14t0d0   ONLINE   0 0 2  12K repaired

errors: No known data errors

Upcoming zpool recovery support is going to help perform this kind of recovery 
in user-friendlier and more automated way.


Btw, pool was originally created on FreeBSD, but we performed recovery on 
Solaris. Pavel said that he was going to stay on OpenSolaris as he learned a lot 
about it along the way ;-)


Cheers,
Victor



Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Victor Latushkin
Sent: 29. heinäkuuta 2009 14:05
To: Pavel Kovalenko
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] zpool import hungs up forever...

On 29.07.09 14:42, Pavel Kovalenko wrote:

fortunately, after several hours terminal went back -->
# zdb -e data1
Uberblock

magic = 00bab10c
version = 6
txg = 2682808
guid_sum = 14250651627001887594
 

Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)

2009-09-28 Thread Frank Middleton

Trying to move this to a new thread, although I don't think it
has anything to do with ZFS :-)

On 09/28/09 08:54 AM, Chris Gerhard wrote:


TMPFS was not in the first release of 4.0. It was introduced to boost
the performance of diskless clients which no longer had the old
network disk for their root file systems and hence /tmp was now over
NFS.

Whether there was a patch that brought it back into 4.0 I don't
recall but I don't think so. 4.0.1 would have been the first release
that actually had it.
--chris


On 09/28/09 03:00 AM, Joerg Schilling wrote:


I am not sure whether my changes will be kept as wikipedia prefers to
keep badly quoted wrong information before correct information supplied by
people who have first hand information.


They actually disallow "first hand information". Everything on Wikipedia
is supposed to be confirmed by secondary or tertiary sources. That's why I
asked if there was any supporting documentation - papers, manuals,
proceedings, whatever, that describe the introduction of tmpfs before
1990. If you were to write a personal page (in Wikipedia if you like)
that describes the history of tmpfs, then you could refer to it in
the tmpfs page as a secondary source. Actually, I suppose if it was
in the source code itself, that would be pretty irrefutable!

http://en.wikipedia.org/wiki/Wikipedia:Reliable_sources

Wikipedia also has a lofi page (http://en.wikipedia.org/wiki/Lofi) that
redirects to "loop mount". It has no historical section at all... There
is no fbk (file system) page.

Cheers -- Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Jim Grisanzio

Jorgen Lundman wrote:

When I approach Sun-Japan directly I just get told that they don't speak
English.  When my Japanese colleagues approach Sun-Japan directly, it is
suggested to us that we stay with our current Vendor.


hey ...

I work at Sun Japan in the Yoga office. I can connect you with English 
speakers here. Contact me off list if you are interested.


Also, there are some Japan lists for OpenSolaris you may want to 
subscribe to: The Japan OpenSolaris User Group and The Tokyo OpenSolaris 
User Group. The Japan group is mostly Japanese, but we are trying to 
build an international group in English for the Tokyo OSUG. There are 
bi-lingual westerners and Japanese on both lists, and we have events in 
Yoga as well.


http://mail.opensolaris.org/mailman/listinfo/ug-tsug (English )
http://mail.opensolaris.org/mailman/listinfo/ug-jposug (Japanese)

Jim
--
http://blogs.sun.com/jimgris/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OS install question

2009-09-28 Thread Frank Middleton

On 09/28/09 12:40 AM, Ron Watkins wrote:


Thus, im at a loss as to how to get the root pool setup as a 20Gb
slice


20GB is too small. You'll be fighting for space every time
you use pkg. From my considerable experience installing to a
20GB mirrored rpool, I would go for 32GB if you can.

Assuming this is X86, couldn't you simply use fdisk to
create whatever partitions you want and then install to
one of them? Than you should be able to create the data
pool using another partition. You might need to use a
weird partition type temporarily. On SPARC there doesn't
seem to be a problem using slices for different zpools,
in fact it insists on using a slice for the root pool.

Cheers -- Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which directories must be part of rpool?

2009-09-28 Thread Chris Gerhard
TMPFS was not in the first release of 4.0. It was introduced to boost the 
performance of diskless clients which no longer had the old network disk for 
their root file systems and hence /tmp was now over NFS.

Whether there was a patch that brought it back into 4.0 I don't recall but I 
don't think so. 4.0.1 would have been the first release that actually had it.

--chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Borked zpool, missing slog/zil

2009-09-28 Thread Victor Latushkin

On 27.09.09 02:28, Ross wrote:

Do you have a backup copy of your zpool.cache file?

If you have that file, ZFS will happily mount a pool on boot without its slog
device - it'll just flag the slog as faulted and you can do your normal
replace.  I used that for a long while on a test server with a ramdisk slog -
and I never needed to swap it to a file based slog.

However without a backup of that file to make zfs load the pool on boot I
don't believe there is any way to import that pool.


If there's no backup of that file, contents of it can be constructed by 
extracting contents of config object from the pool and using it to construct 
cachefile (basically creating nvlist with single name-value pair, where name is 
name of the pool and value is nvlist from the config object).




--
--
Victor Latushkin   phone: x11467 / +74959370467
TSC-Kernel EMEAmobile: +78957693012
Sun Services, Moscow   blog: http://blogs.sun.com/vlatushkin
Sun Microsystems
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Borked zpool, missing slog/zil

2009-09-28 Thread Victor Latushkin

On 27.09.09 14:34, Erik Ableson wrote:

Hmmm - I've got a fairly old copy of the zpool cache file (circa July), but
nothing structural has changed in pool since that date. What other data is held
in that file? There have been some filesystem changes, but nothing critical is
in the newer filesystems.


Cache file keeps pool configuration so zfs can quickly open it upon reboot. If 
there were no changes to the configuration of pool vdevs, then it should 
describe good config.


victor



Any particular procedure required for swapping out the zpool.cache file?

Erik

On Sunday, 27 September, 2009, at 12:28AM, "Ross"  
wrote:

Do you have a backup copy of your zpool.cache file?

If you have that file, ZFS will happily mount a pool on boot without its slog 
device - it'll just flag the slog as faulted and you can do your normal 
replace.  I used that for a long while on a test server with a ramdisk slog - 
and I never needed to swap it to a file based slog.

However without a backup of that file to make zfs load the pool on boot I don't 
believe there is any way to import that pool.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
--
Victor Latushkin   phone: x11467 / +74959370467
TSC-Kernel EMEAmobile: +78957693012
Sun Services, Moscow   blog: http://blogs.sun.com/vlatushkin
Sun Microsystems
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Borked zpool, missing slog/zil

2009-09-28 Thread Victor Latushkin

On 27.09.09 19:35, Erik Ableson wrote:

Good link - thanks. I'm looking at the details for that one and learning a
little zdb at the same time. I've got a situation perhaps a little different in
that I _do_ have a current copy of the slog in a file with what appears to be
current data.

However, I don't see how to attach the slog file to an offline zpool - I have
both a dd backup of the ramdisk slog from midnight as well as the current file
based slog :


Have you tried to make symbolic link from e.g. /dev/dsk/slog to /root/slog.tmp 
and check what 'zpool import' says?



zdb -l /root/slog.tmp

version=14
name='siovale'
state=1
txg=4499446
pool_guid=13808783103733022257
hostid=4834000
hostname='shemhazai'
top_guid=6374488381605474740
guid=6374488381605474740
is_log=1
vdev_tree
type='file'
id=1
guid=6374488381605474740
path='/root/slog.tmp'
metaslab_array=230
metaslab_shift=21
ashift=9
asize=938999808
is_log=1
DTL=51

Is there any way that I can attach this slog to the zpool while it's offline?

Erik

On 27 sept. 2009, at 02:23, David Turnbull  wrote:


I believe this is relevant: http://github.com/pjjw/logfix
Saved my array last year, looks maintained.

On 27/09/2009, at 4:49 AM, Erik Ableson wrote:


Hmmm - this is an annoying one.

I'm currently running an OpenSolaris install (2008.11 upgraded to  
2009.06) :

SunOS shemhazai 5.11 snv_111b i86pc i386 i86pc Solaris

with a zpool made up of one radiz vdev and a small ramdisk based  
zil.  I usually swap out the zil for a file-based copy when I need  
to reboot (zpool replace /dev/ramdisk/slog /root/slog.tmp) but this  
time I had a brain fart and forgot to.


The server came back up and I could sort of work on the zpool but  
it was complaining so I did my replace command and it happily  
resilvered.  Then I restarted one more time in order to test  
bringing everything up cleanly and this time it can't find the file  
based zil.


I try importing and it comes back with:
zpool import
pool: siovale
  id: 13808783103733022257
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
  devices and try again.
 see: http://www.sun.com/msg/ZFS-8000-6X
config:

  siovale UNAVAIL  missing device
raidz1ONLINE
  c8d0ONLINE
  c9d0ONLINE
  c10d0   ONLINE
  c11d0   ONLINE

  Additional devices are known to be part of this pool, though  
their

  exact configuration cannot be determined.

Now the file still exists so I don't know why it can't seem to find  
it and I thought the missing zil issue was corrected in this  
version (or did I miss something?).


I've looked around for solutions to bring it back online and ran  
across this method: 

Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Jorgen Lundman



Tomas Ögren wrote:


http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html
which is in no way official, says it'll be in 10u8 which should be
coming within a month.

/Tomas


That would be perfect. I wonder why I have so much trouble finding information 
about "future releases" of Solaris.


Thanks

Lund


--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs vbox and shared folders

2009-09-28 Thread Chris Gerhard
Not that I have seen. I use them, they work.

--chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Enda O'Connor

Hi
So ship date is 19th October for Solaris 10 10/09 ( update 8 ).

Enda

Enda O'Connor wrote:

Hi
Yes Solaris 10/09 ( update 8 ) will contain
6501037 want user/group quotas on zfs

it should be out within a few weeks.

So if they have zpools already installed they can apply 
141444-09/141445-09 ( 10/09 kernel patch ) and post reboot run zpool 
upgrade to go to zpool version 15 ( the process is non reversible by the 
ay ), which contains 6501037. The patches mentioned will be released 
shortly after 10/09 itself ships ( within a few days of 10/09 shipping 
), if applying patches make sure to apply latest rev of 119254/119255 
first ( the patch utilities patch ), and read the README as well for any 
further instructions.


Enda

Tomas Ögren wrote:

On 28 September, 2009 - Jorgen Lundman sent me these 1,7K bytes:


Hello list,

We are unfortunately still experiencing some issues regarding our 
support license with Sun, or rather our Sun Vendor.


We need ZFS User quotas. (That's not the zfs file-system quota) which 
first appeared in svn_114.


We would like to run something like svn_117 (don't really care which 
version per-se, that is just the one version we have done the most 
testing with).


But our Vendor will only support Solaris 10. After weeks of 
wrangling, they have reluctantly agreed to let us run OpenSolaris 
2009.06. (Which does not have ZFS User quotas).


When I approach Sun-Japan directly I just get told that they don't 
speak  English.  When my Japanese colleagues approach Sun-Japan 
directly, it is  suggested to us that we stay with our current Vendor.


* Will there be official Solaris 10, or OpenSolaris releases with ZFS 
User quotas? (Will 2010.02 contain ZFS User quotas?)


http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html 


which is in no way official, says it'll be in 10u8 which should be
coming within a month.

/Tomas




--
Enda O'Connor x19781  Software Product Engineering
Patch System Test : Ireland : x19781/353-1-8199718
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Enda O'Connor

Hi
Yes Solaris 10/09 ( update 8 ) will contain
6501037 want user/group quotas on zfs

it should be out within a few weeks.

So if they have zpools already installed they can apply 
141444-09/141445-09 ( 10/09 kernel patch ) and post reboot run zpool 
upgrade to go to zpool version 15 ( the process is non reversible by the 
ay ), which contains 6501037. The patches mentioned will be released 
shortly after 10/09 itself ships ( within a few days of 10/09 shipping 
), if applying patches make sure to apply latest rev of 119254/119255 
first ( the patch utilities patch ), and read the README as well for any 
further instructions.


Enda

Tomas Ögren wrote:

On 28 September, 2009 - Jorgen Lundman sent me these 1,7K bytes:


Hello list,

We are unfortunately still experiencing some issues regarding our support 
license with Sun, or rather our Sun Vendor.


We need ZFS User quotas. (That's not the zfs file-system quota) which 
first appeared in svn_114.


We would like to run something like svn_117 (don't really care which 
version per-se, that is just the one version we have done the most 
testing with).


But our Vendor will only support Solaris 10. After weeks of wrangling, 
they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which 
does not have ZFS User quotas).


When I approach Sun-Japan directly I just get told that they don't speak  
English.  When my Japanese colleagues approach Sun-Japan directly, it is  
suggested to us that we stay with our current Vendor.


* Will there be official Solaris 10, or OpenSolaris releases with ZFS 
User quotas? (Will 2010.02 contain ZFS User quotas?)


http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html
which is in no way official, says it'll be in 10u8 which should be
coming within a month.

/Tomas


--
Enda O'Connor x19781  Software Product Engineering
Patch System Test : Ireland : x19781/353-1-8199718
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)

2009-09-28 Thread Darren J Moffat

Joerg Schilling wrote:


Just to prove my information: I invented "fbk" (which Sun now calls "lofi")


Sun does NOT call your fbk by the name lofi.  Lofi is a completely 
different implementation of the same concept.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unusual latency issues

2009-09-28 Thread Andrew Gabriel




Markus Kovero wrote:

  
  
  

  
  Hi, this may not be correct
mailinglist for
this, but I’d like to share this with you, I noticed weird network
behavior
with osol snv_123.
  icmp for host lags randomly
between 500ms-5000ms
and ssh sessions seem to tangle, I guess this could affect iscsi/nfs as
well.
   
  what was most intresting that
I found
workaround to be running snoop with promiscuous mode disabled on
interfaces
suffering lag, this did make interruptions go away. Is this somekind
cpu/irq
scheduling issue?
   
  Behaviour was noticed on two
different
platform and with two different nics (bge and e1000).
   
  


Unless you have some specific reason for thinking this is a zfs issue,
you probably want to ask on the crossbow-discuss mailing list.

-- 
Andrew


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Tomas Ögren
On 28 September, 2009 - Jorgen Lundman sent me these 1,7K bytes:

>
> Hello list,
>
> We are unfortunately still experiencing some issues regarding our support 
> license with Sun, or rather our Sun Vendor.
>
> We need ZFS User quotas. (That's not the zfs file-system quota) which 
> first appeared in svn_114.
>
> We would like to run something like svn_117 (don't really care which 
> version per-se, that is just the one version we have done the most 
> testing with).
>
> But our Vendor will only support Solaris 10. After weeks of wrangling, 
> they have reluctantly agreed to let us run OpenSolaris 2009.06. (Which 
> does not have ZFS User quotas).
>
> When I approach Sun-Japan directly I just get told that they don't speak  
> English.  When my Japanese colleagues approach Sun-Japan directly, it is  
> suggested to us that we stay with our current Vendor.
>
> * Will there be official Solaris 10, or OpenSolaris releases with ZFS 
> User quotas? (Will 2010.02 contain ZFS User quotas?)

http://sparcv9.blogspot.com/2009/08/solaris-10-update-8-1009-is-comming.html
which is in no way official, says it'll be in 10u8 which should be
coming within a month.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Unusual latency issues

2009-09-28 Thread Markus Kovero
Hi, this may not be correct mailinglist for this, but I'd like to share this 
with you, I noticed weird network behavior with osol snv_123.
icmp for host lags randomly between 500ms-5000ms and ssh sessions seem to 
tangle, I guess this could affect iscsi/nfs as well.

what was most intresting that I found workaround to be running snoop with 
promiscuous mode disabled on interfaces suffering lag, this did make 
interruptions go away. Is this somekind cpu/irq scheduling issue?

Behaviour was noticed on two different platform and with two different nics 
(bge and e1000).

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Fajar A. Nugraha
On Mon, Sep 28, 2009 at 2:20 PM, Jorgen Lundman  wrote:
> We would like to run something like svn_117 (don't really care which version
> per-se, that is just the one version we have done the most testing with).
>
> But our Vendor will only support Solaris 10. After weeks of wrangling, they
> have reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not
> have ZFS User quotas).

I though http://www.sun.com/service/opensolaris/ was supposed to be
made for people with your needs?

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris License with ZFS USER quotas?

2009-09-28 Thread Jorgen Lundman


Hello list,

We are unfortunately still experiencing some issues regarding our support 
license with Sun, or rather our Sun Vendor.


We need ZFS User quotas. (That's not the zfs file-system quota) which first 
appeared in svn_114.


We would like to run something like svn_117 (don't really care which version 
per-se, that is just the one version we have done the most testing with).


But our Vendor will only support Solaris 10. After weeks of wrangling, they have 
reluctantly agreed to let us run OpenSolaris 2009.06. (Which does not have ZFS 
User quotas).


When I approach Sun-Japan directly I just get told that they don't speak 
English.  When my Japanese colleagues approach Sun-Japan directly, it is 
suggested to us that we stay with our current Vendor.


* Will there be official Solaris 10, or OpenSolaris releases with ZFS User 
quotas? (Will 2010.02 contain ZFS User quotas?)


* Can we get support overseas perhaps, that will let us run a version of Solaris 
with ZFS User quotas? Support generally includes having the ability to replace 
hardware when it dies, and/or, send panic dumps if they happen for future patches.


Internally, we are now discussing returning our 12x x4540, and calling NetApp. I 
would rather not (more work for me).


I understand Sun is probably experiencing some internal turmoil at the moment, 
but it has been rather frustrating for us.


Lund

--
Jorgen Lundman   | 
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fixing Wikipedia tmpfs article (was Re: Which directories must be part of rpool?)

2009-09-28 Thread Joerg Schilling
Frank Middleton  wrote:

> On 09/27/09 11:25 AM, Joerg Schilling wrote:
> > Frank Middleton  wrote:
>
> >> Could you fix the Wikipedia article? http://en.wikipedia.org/wiki/TMPFS
> >>
> >> "it first appeared in SunOS 4.1, released in March 1990"
> >
> > It appeared with SunOS-4.0. The official release was probably Februars 1987,
> > but there have been betas before IIRC.
>
> Do you have any references one could quote so that the Wikipedia
> article can be corrected? The section on Solaris is rather skimpy
> and could do with some work...

I am not sure whether my changes will be kept as wikipedia prefers to 
keep badly quoted wrong information before correct information supplied by 
people who have first hand information. 


Just to prove my information: I invented "fbk" (which Sun now calls "lofi")
in summer 1988 after I received the sources for SunOS-4.0. "fbk" was my 
playground for the new vnode interface before I wrote "wofs" the probably first 
copy on write filesystem. I definitely know that tmpfs was in 4.0.




Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss