[zfs-discuss] Howto reclaim space under legacy mountpoint?

2010-09-19 Thread Gary Gendel
I moved my home directories to a new disk and then mounted the disk using a 
legacy mount point over /export/home.  Here is the output of the zfs list:

NAME USED  AVAIL  REFER  MOUNTPOINT
rpool   55.8G  11.1G83K  /rpool
rpool/ROOT  21.1G  11.1G19K  legacy
rpool/ROOT/snv-134  21.1G  11.1G  14.3G  /
rpool/dump  1.97G  11.1G  1.97G  -
rpool/export30.8G  11.1G23K  /export
rpool/export/home   30.8G  11.1G  29.3G  legacy
rpool/swap  1.97G  12.9G   144M  -
users   32.8G   881G  31.1G  /export/home

The question is how to remove the files from the orginal rpool/export/home (non 
mount point) rpool?  I a bit nervous to do a:

zfs destroy rpool/export/home

Is the the correct and safe methodology?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Newbie question

2010-09-05 Thread Gary Gendel
Norm,

Thank you.  I just wanted to double-check to make sure I didn't mess up things. 
 There were steps that I was head-scratching after reading the man page.  I'll 
spend a bit more time re-reading it using the steps outlined so I understand 
these fully.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Newbie question

2010-09-05 Thread Gary Gendel
I would like to migrate my home directories to a new mirror.  Currently, I have 
them in rpool:

rpool/export
rpool/export/home

I've created a mirror pool, users.

I figure the steps are:
1) snapshot rpool/export/home
2) send the snapshot to users.
3) unmount rpool/export/home
4) mount pool users to /export/home

So, what are the appropriate commands for these steps?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] root pool expansion

2010-07-28 Thread Gary Gendel
Right now I have a machine with a mirrored boot setup.  The SAS drives are 43Gs 
and the root pool is getting full.

I do a backup of the pool nightly, so I feel confident that I don't need to 
mirror the drive and can break the mirror and expand the pool with the detached 
drive.

I understand how to do this on a normal pool, but is there any restrictions for 
doing this on the root pool?  Are there any grub issues?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposition of a new zpool property.

2010-03-20 Thread Gary Gendel
I'm not sure I like this at all.  Some of my pools take hours to scrub.  I have 
a cron job run scrubs in sequence...  Start one pool's scrub and then poll 
until it's finished, start the next and wait, and so on so I don't create too 
much load and bring all I/O to a crawl.

The job is launched once a week, so the scrubs have plenty of time to finish. :)

Scrubs every hour?  Some of my pools would be in continuous scrub.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What Happend to my OpenSolaris X86 Install?

2010-02-11 Thread Gary Gendel
My guess is that the grub bootloader wasn't upgraded on the actual boot disk.  
Search for directions on how to mirror ZFS boot drives and you'll see how to 
copy the correct grub loader onto the boot disk.

If you want to do this simpler, swap the disks.  I did this when I was moving 
from SXCE to OSOL so I could make sure that things worked before making one of 
the drives a mirror.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repeating scrub does random fixes

2010-01-12 Thread Gary Gendel
Thanks for all the suggestions.  Now for a strange tail...

I tried upgrading to dev 130 and, as expected, things did not go well.  All 
sorts of permission errors flew by during the upgrade stage and it would not 
start X-windows.  I've heard that things installed from the  contrib and extras 
repositories might cause issues but I didn't want to spend the time with my 
server offline while i tried to figure this out.

So, I booted back to 111b and scrubs still showed errors.  Late in the evening, 
the pool faulted preventing any backups from the other servers to this pool.  
Greeted this morning with the "recover files from backup" status message sent 
shivers up my spine.  This IS my backup.

I exported the pool and then imported it, which it did successfully.  Now the 
scrubs run cleanly (at least for a few repeated scrubs spanning several hours). 
 So, was it hardware?  What the heck could have fixed it by just exporting and 
importing the pool?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repeating scrub does random fixes

2010-01-11 Thread Gary Gendel
I've just made a couple of consecutive scrubs, each time it found a couple of 
checksum errors but on different drives.  No indication of any other errors.  
That a disk scrubs cleanly on a quiescent pool in one run but fails in the next 
is puzzling.  It reminds me of the snv_120 odd number of disks raidz bug I 
reported.

Looks like I've got to bite the bullet and upgrade to the dev tree and hope for 
the best.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repeating scrub does random fixes

2010-01-10 Thread Gary Gendel

Mattias Pantzare wrote:

On Sun, Jan 10, 2010 at 16:40, Gary Gendel  wrote:
  

I've been using a 5-disk raidZ for years on SXCE machine which I converted to 
OSOL.  The only time I ever had zfs problems in SXCE was with snv_120, which 
was fixed.

So, now I'm at OSOL snv_111b and I'm finding that scrub repairs errors on 
random disks.  If I repeat the scrub, it will fix errors on other disks.  
Occasionally it runs cleanly.  That it doesn't happen in a consistent manner 
makes me believe it's not hardware related.




That is a good indication for hardware related errors. Software will
do the same thing every time but hardware errors are often random.

But you are running an older version now, I would recommend an upgrade.
  


I would have thought that too if it didn't start right after the switch 
from SXCE to OSOL.  As for an upgrade, use the dev repository on my 
laptop and I find that OSOL updates aren't nearly as stable as SXCE 
was.  I tried for a bit, but always had to go back to 111b because 
something crucial broke.  I was hoping to wait until the official 
release in March in order to let things stabilize.  This is my main 
web/mail/file/etc. server and I don't really want to muck too much.


That said, I may take a gambol on upgrading as we're getting closer to 
the 2010.x release.



Gary

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Repeating scrub does random fixes

2010-01-10 Thread Gary Gendel
I've been using a 5-disk raidZ for years on SXCE machine which I converted to 
OSOL.  The only time I ever had zfs problems in SXCE was with snv_120, which 
was fixed.

So, now I'm at OSOL snv_111b and I'm finding that scrub repairs errors on 
random disks.  If I repeat the scrub, it will fix errors on other disks.  
Occasionally it runs cleanly.  That it doesn't happen in a consistent manner 
makes me believe it's not hardware related.

fmdump only reports, three types of errors:

ereport.fs.zfs.checksum
ereport.io.scsi.cmd.disk.tran
ereport.io.scsi.cmd.disk.recovered

The middle one seems to be the issue I'd like to track down the source.  Any 
docs on how to do this?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Accidentally added disk instead of attaching

2009-12-07 Thread Gary Gendel
+1

I support a replacement for a SCM system that used "open" as an alias for 
"edit" and a separate command, "opened" to see what was opened for edit, 
delete, etc.  Our customers accidentally used "open" when they meant "opened" 
so many times that we blocked it as a command.  It saved us a lot of support 
calls.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] freeNAS moves to Linux from FreeBSD

2009-12-06 Thread Gary Gendel
The only reason I thought this news would be of interest is that the 
discussions had some interesting comments.  Basically, there is a significant 
outcry because zfs was going away.  I saw NextentaOS and EON mentioned several 
times as the path to go.

Seem that there is some opportunity for OpenSolaris advocacy in this arena 
while the topic is hot.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple shuts down open source ZFS project

2009-10-24 Thread Gary Gendel
Apple is known to strong arm in licensing negotiations.  I'd really like to 
hear the straight-talk about what transpired.

That's ok, it just means that I won't be using mac as a server.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with snv_122 Zpool issue

2009-09-12 Thread Gary Gendel
You shouldn't hit the Raid-Z issue because it only happens with an odd number 
of disks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem with RAID-Z in builds snv_120 - snv_123

2009-09-03 Thread Gary Gendel
Alan,

Thanks for the detailed explanation.  The rollback successfully fixed my 5-disk 
RAID-Z errors.  I'll hold off another upgrade attempt until 124 rolls out.  
Fortunately, I didn't do a zfs upgrade right away after installing 121.  For 
those that did, this could be very painful.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool

2009-08-28 Thread Gary Gendel
Alan,

Super find.  Thanks, I thought I was just going crazy until I rolled back to 
110 and the errors disappeared.  When you do work out a fix, please ping me to 
let me know when I can try an upgrade again.

Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool

2009-08-27 Thread Gary Gendel
It looks like It's definitely related to the snv_121 upgrade.  I decided to 
roll back to snv_110 and the checksum errors have disappeared.  I'd like to 
issue a bug report, but I don't have any information that might help track this 
down, just lots of checksum errors.

Looks like I'm stuck at snv_110 until someone figures out what is broken.  If 
it helps, here is my properly list for this pool.

g...@phoenix[~]101>zfs get all archive
NAME PROPERTY  VALUE  SOURCE
archive  type  filesystem -
archive  creation  Mon Jun 18 20:40 2007  -
archive  used  787G   -
archive  available 1.01T  -
archive  referenced125G   -
archive  compressratio 1.13x  -
archive  mounted   yes-
archive  quota none   default
archive  reservation   none   default
archive  recordsize128K   default
archive  mountpoint/archive   default
archive  sharenfs  offdefault
archive  checksum  on default
archive  compression   on local
archive  atime offlocal
archive  devices   on default
archive  exec  on default
archive  setuidon default
archive  readonly  offdefault
archive  zoned offdefault
archive  snapdir   hidden default
archive  aclmode   groupmask  default
archive  aclinheritrestricted default
archive  canmount  on default
archive  shareiscsioffdefault
archive  xattr on default
archive  copies1  default
archive  version   3  -
archive  utf8only  off-
archive  normalization none   -
archive  casesensitivity   sensitive  -
archive  vscan offdefault
archive  nbmandoffdefault
archive  sharesmb  offlocal
archive  refquota  none   default
archive  refreservationnone   default
archive  primarycache  alldefault
archive  secondarycachealldefault

And each of the sub-pools look like this:

g...@phoenix[~]101>zfs get all archive/gary
archive/gary  type  filesystem -
archive/gary  creation  Mon Jun 18 20:56 2007  -
archive/gary  used  141G   -
archive/gary  available 1.01T  -
archive/gary  referenced141G   -
archive/gary  compressratio 1.22x  -
archive/gary  mounted   yes-
archive/gary  quota none   default
archive/gary  reservation   none   default
archive/gary  recordsize128K   default
archive/gary  mountpoint/archive/gary  default
archive/gary  sharenfs  offdefault
archive/gary  checksum  on default
archive/gary  compression   on inherited from 
archive
archive/gary  atime offinherited from 
archive
archive/gary  devices   on default
archive/gary  exec  on default
archive/gary  setuidon default
archive/gary  readonly  offdefault
archive/gary  zoned offdefault
archive/gary  snapdir   hidden default
archive/gary  aclmode   groupmask  default
archive/gary  aclinheritpassthroughlocal
archive/gary  canmount  on default
archive/gary  shareiscsioffdefault
archive/gary  xattr on default
archive/gary  copies1  default
archive/gary  version   3  -
archive/gary  utf8only  off-
archive/gary  normalization none   -
archive/gary  casesensitivity   sensitive  -
archive/gary  vscan offdefault
archive/gary  nbm

[zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool

2009-08-25 Thread Gary Gendel
I have a 5-500GB disk Raid-Z pool that has been producing checksum errors right 
after upgrading SXCE to build 121.  They seem to be randomly occurring on all 5 
disks, so it doesn't look like a disk failure situation.

Repeatingly running a scrub on the pools randomly repairs between 20 and a few 
hundred checksum errors.

Since I hadn't physically touched the machine, it seems a very strong 
coincidence that it started right after I upgraded to 121.

This machine is a SunFire v20z with a Marvell SATA 8-port controller (the same 
one as in the original thumper).  I've seen this kind of problem way back 
around build 40-50 ish, but haven't seen it after that until now.

Anyone else experiencing this problem or knows how to isolate the problem 
definitively?

Thanks,
Gary
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-09 Thread Gary Gendel
> Most video formats are designed to handle
> errors--they'll drop a frame
> or two, but they'll resync quickly.  So, depending on
> the size of the
> error, there may be a visible glitch, but it'll keep
> working.

Actually, Let's take MPEG as an example.  There are two basic frame types, 
anchor frames and predictive frames.  Of the predictive frames, there are 
one-way predictive and multi-way predictive.  The predictive frames offer 
significantly more compression than anchor frames, and thus are favored in 
higher compressed streams.  However, if an error occurs on a frame, that error 
will propagate until it either moves off the frame, or an anchor frame is 
reached.

In broadcast, they typically space the anchor frames every half second, to 
bound the time it takes to start a new stream when changing channels.  However, 
this also means that an error may take up to a half second to recover.  
Depending upon the type of error, this could be confined to a single block, a 
stripe, or even a whole frame.

On more constraint bandwidth systems, like teleconferencing, I've seen anchor 
frames spaced as much as 30 seconds apart.  These usually included some minimal 
error concealment techniques, but aren't really robust.

So I guess it depends upon what you mean by "recover fast". It could be as 
short as a fraction of a second, but could be several seconds.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-10 Thread Gary Gendel
I'm not sure.  But when I would re-run a scrub, I got the errors at the same 
block numbers, which indicated that the disk was really bad.  It wouldn't hurt 
to make the entry in the /etc/system file, reboot, and then try the scrub 
again. If the problem disappears then it is a driver bug.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Norco's new storage appliance

2007-10-09 Thread Gary Gendel
Norco usually uses Silicon Image based SATA controllers. The OpenSolaris driver 
for this has caused me enough headaches for me to replace it with a Marvell 
based board. I would also imagine that they use a 5 to 1 SATA multiplexer, 
which is not supported by any OpenSolaris driver that I've tested.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-09 Thread Gary Gendel
Are there any clues in the logs?  I have had a similar problem when a disk bad 
block was uncovered by zfs.  I've also seen this when using the Silicon Image 
driver without the recommended patch.

The former became evident when I ran a scrub. I saw the SCSI timeout errors pop 
up in the "kern" syslogs.  I solved this by replacing the disk.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best option for my home file server?

2007-09-28 Thread Gary Gendel
Just keep in mind that I tried the patched driver and occasionally had kernel 
panics because of recursive mutex calls.  I believe that it isn't 
multi-processor safe. I switched to the Marvell chipset and have been much 
happier.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best option for my home file server?

2007-09-26 Thread Gary Gendel
> I'm about to build a fileserver and I think I'm gonna
> use OpenSolaris and ZFS.
> 
> I've got a 40GB PATA disk which will be the OS disk,
> and then I've got 4x250GB SATA + 2x500GB SATA disks.
> From what you are writing I would think my best
> option would be to slice the 500GB disks in two 250GB
> and then make two RAIDz with two 250 disks and one
> partition from each 500 disk, giving me two RAIDz of
> 4 slices of 250, equaling to 2 x 750GB RAIDz.

Why not do it this way...

Pair the 250GB drives into a 500GB Raid-1 and then use these in the RAID-Z 
configuration?  I would think that this setup would have no less performance 
than a Raid-Z using only 500GB drives.  However, you'll have to ask someone for 
the zfs command magic to do this. Off the top of my head, I'm not sure.

> How would the performance be with this? I mean, it
> would probably drop since I would have two raidz
> slices on one disk.
> 
> From what I gather, I would still be able to lose one
> of the 500 disks (or 250) and still be able to
> recover, right?

Right, but then if a 500GB drive fails you degrade both pools.

> 
> Perhaps I should just get another 500GB disk and run
> a RAIDz on the 500s and one RAIDz on the 250s?

That sounds even better.
 
> I'm also a bit of a noob when it comes to ZFS (but it
> looks like it's not that hard to admin) - Would I be
> able to join the two RAIDz together for one BIG
> volume altogether? And it will survive one disk
> failure?

No, these would be two separate pools of storage.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (politics) Sharks in the waters

2007-09-08 Thread Gary Gendel
Thanks Jim for the entertainment.  I was party to a similar mess.  My father 
owned an operated a small Electrical Supply business that I worked at since the 
age of 8. I was recently pulled into a large class-action Asbestos suit against 
the business since I was the only one still alive through the period of 
interest (1960s).

I was subject to a deposition in a room with 50+ lawyers and myself. The whole 
situation was so surreal. The questioning went something like this:

Q: Did you sell lighting fixtures with wire?
A: Yes
Q: Did it have asbestos?
A: No
Q: How can you be so sure.
A: I hung over 20,000 fixtures in the showroom over the years. In order to save 
time, I would strip the wire with my teeth. I think I'd know if they contained 
asbestos by now.
Q: You'd do what?
A: I'd strip the wire with my teeth. I can demonstrate, if you want.

I was told that I had the record for a deposition in this case, lasting almost 
4 hours. A month later I got the transcript and they actually changed my words! 
I sent back 50 pages of corrections to a 200 page document.

A year later they dropped the suit. What a total waste of human effort and 
money.

Gary

> About 2 years ago I was able to get a little closer
> to the patent 
> litigation process,
> by way of giving a deposition in litigation that was
> filed against Sun 
> and Apple
> (and has been settled).
> 
> Apparently, there's an entire sub-economy built on
> patent litigation 
> among the
> technology players. Suits, counter-suits,
> counter-counter-suits, etc, 
> are just
> part of every day business. And the money that gets
> poured down the drain!
> 
> Here's an example. During my deposition, the lawyer
> questioning me opened
> a large box, and removed 3 sets of a 500+ slide deck
> created by myself and
> Richard McDougall for seminars and tutorials on
> Solaris. Each set was
> color print on heavy, glossy paper. That represented
> color printing of about
> 1600 pages total. All so the attorney could question
> me about 2 of the 
> slides.
> 
> I almost fell off my chair
> 
> /jim
> 
> 
> 
> Rob Windsor wrote:
> >
> http://news.com.com/NetApp+files+patent+suit+against+S
> un/2100-1014_3-6206194.html
> >
> > I'm curious how many of those patent filings cover
> technologies that 
> > they carried over from Auspex.
> >
> > While it is legal for them to do so, it is a bit
> shady to inherit 
> > technology (two paths; employees departing Auspex
> and the Auspex 
> > bankruptcy asset buyout), file patents against that
> technology, and then 
> > open suits against other companies based on
> (patents covering) that 
> > technology.
> >
> > (No, I'm not defending Sun in it's apparent
> patent-growling, either, it 
> > all sucks IMO.)
> >
> > Rob++
> >   
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub halts

2007-08-27 Thread Gary Gendel
> I can confirm that the marvell88sx driver (or kernel
> 64a) regularly hangs the SATA card (SuperMicro
> 8-port) with the message about a port being reset.
> The hang is temporary but troublesome.
> It can be relieved by turning off NCQ in /etc/system
> with "set sata:sata_func_enable = 0x5"

Thanks for the info. I'll have to give this a try.

BTW, I've verified that this happens on build 70 as well.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub halts

2007-08-15 Thread Gary Gendel
Al,

That makes so much sense that I can't believe I missed it. One bay was the one 
giving me the problems. Switching drives didn't affect that. Switching cabling 
didn't affect that. Changing Sata controllers didn't affect that. However, 
reorienting the case on it's side did!

I'll be putting in a larger fan into the disk-stack case.

Gary

> On Tue, 14 Aug 2007, Richard Elling wrote:
> 
> > Rick Wager wrote:
> >> We see similar problems on a SuperMicro with 5 500
> GB Seagate sata drives. This is using the AHCI
> driver. We do not, however, see problems with the
> same hardware/drivers if we use 250GB drives.
> >
> > Duh.  The error is from the disk :-)
> 
> A likely possiblity is that the disk drives are
> simply not getting 
> enough (cool) airflow and are over-heating during
> periods of high 
> system activity that generates a lot of disk head
> movement; for 
> example, during a zpool scrub.  And the extra
> platters present in the 
> larger disk drives would require even more cooling
> capacity - which 
> would validate your observations.
> 
> Best to actually *measure* the effectiveness of the
> disk cooling 
> design/installation.  Recommendation: investigate the
> Fluke mini 
> infrared thermometers - for example - the Fluke 62
> at: 
> http://www.testequipmentdepot.com/fluke/thermometers/6
> 2.htm
> 
> In some disk drive installations, its possible for
> the infrared probe 
> to "see" the disk HDA (Head Disk Assembly) without
> disturbing the 
> drive.
> 
> PS: I use a much older Fluke 80T-IR in combination
> with a digital 
> multimeter with millivolt resolution (a Fluke meter
> of course!).
> 
> >> We sometimes see bad blocks reported (are these
> automatically remapped somehow so they are not used
> again?) and sometimes sata port resets.
> >
> > Depending on how the errors are reported, the
> driver may attempt a reset
> > to clear.  The drive may also automaticaly spare
> bad blocks.
> >
> >> Here is a sample of the log output. Any help
> understanding and/or resolving this issue greatly
> appreciated. I very much don't wont to have freezes
> in production.
> >>
> >> Aug 14 11:20:28 chazz1  port 2: device reset
> >> Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.warning] WARNING:
> /[EMAIL PROTECTED],0/pci15d9,[EMAIL PROTECTED],2/[EMAIL PROTECTED],0 (sd3):
> >> Aug 14 11:20:28 chazz1  Error for Command: write
>   Error Level: Retryable
> chazz1 scsi: [ID 107833 kern.notice]Requested
>  Block: 530   Error Block: 530
> > Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.notice]Vendor: ATA
>Serial Number:
> [ID 107833 kern.notice]Sense Key:
>  No_Additional_Sense
> > Aug 14 11:20:28 chazz1 scsi: [ID 107833
> kern.notice]ASC: 0x0 (no additional sense info),
> ASCQ: 0x0, FRU: 0x0
> >
> > This error was transient and retried.  If it was a
> fatal error (still
> > failed after retries) then you'll have another,
> different message
> > describing the failed condition.
> >  -- richard
> >
> 
> Regards,
> 
> Al Hopper  Logical Approach Inc, Plano, TX.
>  [EMAIL PROTECTED]
> Voice: 972.379.2133 Fax: 972.379.2134
>   Timezone: US CDT
> enSolaris Governing Board (OGB) Member - Apr 2005 to
> Mar 2007
> http://www.opensolaris.org/os/community/ogb/ogb_2005-2
> 007/
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub halts

2007-08-06 Thread Gary Gendel
Thanks for the information. I am using the Marvell8sx driver on a vanilla 
Sunfire v20z server. This project has gone through many frustrating phases...

Originally I tried a Si3124 board with the box running a 5-1 Sil Sata 
multiplexer. The controller didn't understand the multiplexer so I put in a 
second board and drove the drives directly.

However, this didn't work well either and would lock up periodically. I added 
some published driver patches which made things better, but I would still get 
periodic kernel panics because of a recursive mutex call.

So, I bought the Supermicro 8 channel Sata Marvell card. I tried the 
multiplexer again, but no luck, so I'm driving each separately again. 
Occasionally, I would have a system lockup and could only force the system to 
power down. I believe that this may have been due to a flaky Sata connection 
internal to the box. Now I'm left with the situation that I described.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] scrub halts

2007-08-05 Thread Gary Gendel
I've got a 5-500Gb Sata Raid-Z stack running under build 64a. I have two 
problems that may or may not be interrelated.

1) zpool scrub stops. If I do a "zpool status" it merrily continues for awhile. 
I can't see any pattern in this action with repeated scrubs.

2) Bad blocks on one disk. This is repeatable, so I'm sending the disk back for 
replacement. (1) doesn't seem to correlate to the time I hit the bad blocks, so 
I don't think this is related. However... When it does hit those blocks, I not 
only get media sense read errors, but the sata port is dropped and reconnected. 
I think the driver probably does a port reset, but I figured I'd note it for 
discussion. Is there a way to remap the bad blocks for zfs? There were only a 
small number (19) that it hit during the scrub.

I'd like to hear some general comments about these issues I'm having with zfs.

Thanks,
Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS + ISCSI + LINUX QUESTIONS

2007-06-22 Thread Gary Gendel
Al,

Has there been any resolution to this problem? I get it repeatedly on my 
5-500GB Raidz configuration. I sometimes get port drop/reconnect errors when 
this occurs.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Proper way to detach attach

2007-06-21 Thread Gary Gendel
Hi,

I've got some issues with my 5-disk SATA stack using two controllers. Some of 
the ports are acting strangely, so I'd like to play around and change which 
ports the disks are connected to. This means that I need to bring down the 
pool, swap some connections and then bring the pool back up. I may have to 
repeat this several times.

I just wanted to clarify the steps needed to do this so I don't lose everything.

Thanks,
Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: update on zfs boot support

2007-03-12 Thread Gary Gendel
This is great news. A question crossed my mind. I'm sure it's a dumb one but I 
thought I'd ask anyway...

How will LiveUpdate work when the boot partition is in the pool?

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: File System Filter Driver??

2007-03-02 Thread Gary Gendel
Rayson,

Filter drivers in NTFS are very clever. I was once toying with using it to put 
unix-style symbolic links in Windows.

In this case, I think that such a clever idea wasn't thought through. Anyone 
and everyone can add such an layer to the file operation stack. The worst part 
is that you can't be sure where in the filter stack you will be put, so you're 
at the mercy of the other filter drivers. For example, any filter driver can 
trap for something and not pass it down the stack.

This one feature is usually how virus scanners hook into the file operation 
stack and is one of the reasons that the whole machine slows down to a crawl 
when one of these animals are installed.

In the Linux/BSD world the alternative is the user file system (FUSE). There is 
already ongoing work to bring this to OpenSolaris and I can't wait. It can do 
almost everything a filter driver can with much less overhead. It also brings a 
huge array of other filesystems into the hands of Solaris Users.

My 2cents.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Perforce on ZFS

2007-02-21 Thread Gary Gendel
Perforce is based upon berkely db (some early version), so standard "database 
XXX on ZFS" techniques are relevant. For example, putting the journal file on a 
different disk than the table files. There are several threads about optimizing 
databases under ZFS.

If you need a screaming perforce server, talk to IC Manage, Inc. who is a VAR 
of Perforce. They also have added the ability to do remote replication, etc. so 
you can can have servers local to the end users in an enterprise environment.

It seems to me that the network is usually the limiting factor in Perforce 
transactions, though operations like "fstat" and "have" shouldn't be overused 
because they are very table taxing. Later Perforce versions have reduced the 
amount of table and record locking that goes on so you might find improvement 
just by upgrading both servers and clients (the server operations downgrade to 
match the version of the client).

All this said, I'd love to see experiments done with perforce on ZFS. It would 
help us all tune ZFS for these kinds of applications.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs on removable drive

2006-10-26 Thread Gary Gendel
Here is the problem I'm trying to solve...

Ive been using a sparc machine as my primary home server for years. A few years 
back the motherboard died. I did a nightly backup on an external USB drive 
formatted in ufs format. I use a rsync based backup called dirvish, so I 
thought I had all the bases covered. I basically mount the USB drive, do the 
backup, and then unmount the drive. This guarantees that (except while it's 
mounted) the backup filesystem is relatively safe from glitches.

I quickly brought up an x86 machine but found it couldn't read the ufs drive 
(endian issue). I tried Linux which claims it could read either endian ufs 
without success.

So, I picked up another sparc machine to get me up and running again. However, 
I'm now in the same boat if it dies.

Now I need to downgrade the sparc machine to Solaris 8 (work related reasons), 
so I migrated all my services to an Opteron Sunfire server running SCXR b49 
with a zfs mirrored pool. However, root is on a non-mirrored ufs partition so I 
still would like to start the full backups onto the USB drive again.

I want to avoid the ufs endian issue, so I figure that zfs is the right format 
to use on the drive. It's not clear whether this is possible. I thought that 
using a zfs import/export into it's own pool would work, but I wanted to run it 
through here for comments first. Then if the system dies, I can restore files 
quickly from any architecture server.

BTW, zfs rocks!

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss