[zfs-discuss] ZFS on Linux vs FreeBSD
This may fall into the realm of a religious war (I hope not!), but recently several people on this list have said/implied that ZFS was only acceptable for production use on FreeBSD (or Solaris, of course) rather than Linux with ZoL. I'm working on a project at work involving a large(-ish) amount of data, about 5TB, working its way up to 12-15TB eventually, spread among a dozen or so nodes. There may or may not be a clustered filesystem involved (probably gluster if we use anything). I've been looking at ZoL as the primary filesystem for this data. We're a Linux shop, so I'd rather not switch to FreeBSD, or any of the Solaris-derived distros--although I have no problem with them, I just don't want to introduce another OS into the mix if I can avoid it. So, the actual questions are: Is ZoL really not ready for production use? If not, what is holding it back? Features? Performance? Stability? If not, then what kind of timeframe are we looking at to get past whatever is holding it back? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Linux vs FreeBSD
9:59am, Richard Elling wrote: On Apr 25, 2012, at 5:48 AM, Paul Archer wrote: This may fall into the realm of a religious war (I hope not!), but recently several people on this list have said/implied that ZFS was only acceptable for production use on FreeBSD (or Solaris, of course) rather than Linux with ZoL. I'm working on a project at work involving a large(-ish) amount of data, about 5TB, working its way up to 12-15TB This is pretty small by today's standards. With 4TB disks, that is only 3-4 disks + redundancy. True. At my last job, we were used to researchers asking for individual 4-5TB filesystems, and 1-2TB increases in size. When I left, there was over a 100TB online (in '07). eventually, spread among a dozen or so nodes. There may or may not be a clustered filesystem involved (probably gluster if we use anything). I wouldn't dream of building a clustered file system that small. Maybe when you get into the multiple-PB range, then it might make sense. The point of a clustered filesystem was to be able to spread our data out among all nodes and still have access from any node without having to run NFS. Size of the data set (once you get past the point where you can replicate it on each node) is irrelevant. I've been looking at ZoL as the primary filesystem for this data. We're a Linux shop, so I'd rather not switch to FreeBSD, or any of the Solaris-derived distros--although I have no problem with them, I just don't want to introduce another OS into the mix if I can avoid it. So, the actual questions are: Is ZoL really not ready for production use? If not, what is holding it back? Features? Performance? Stability? The computer science behind ZFS is sound. But it was also developed for Solaris which is quite different than Linux under the covers. So the Linux and other OS ports have issues around virtual memory system differences and fault management differences. This is the classic getting it to work is 20% of the effort, getting it to work when all else is failing is the other 80% case. -- richard I understand the 80/20 rule. But this doesn't really answer the question(s). If there weren't any major differences among operating systems, the project probably would have been done long ago. To put it slightly differently, if I used ZoL in production, would I be likely to experience performance or stability problems? Or would it be lacking in features that I would likely need?___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
11:26am, Richard Elling wrote: On Apr 25, 2012, at 10:59 AM, Paul Archer wrote: The point of a clustered filesystem was to be able to spread our data out among all nodes and still have access from any node without having to run NFS. Size of the data set (once you get past the point where you can replicate it on each node) is irrelevant. Interesting, something more complex than NFS to avoid the complexities of NFS? ;-) We have data coming in on multiple nodes (with local storage) that is needed on other multiple nodes. The only way to do that with NFS would be with a matrix of cross mounts that would be truly scary. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Linux vs FreeBSD
To put it slightly differently, if I used ZoL in production, would I be likely to experience performance or stability problems? I saw one team revert from ZoL (CentOS 6) back to ext on some backup servers for an application project, the killer was stat times (find running slow etc.), perhaps more layer 2 cache could have solved the problem, but it was easier to deploy ext/lvm2. Hmm... I've got 1.4TB in about 70K files in 2K directories, and a simple find on a cold FS took me about 6 seconds: root@hoard22:/hpool/12/db# time find . -type d | wc df -h 20822082 32912 real0m5.923s user0m0.052s sys 0m1.012s So I'd say I'm doing OK there. But I've got 10K disks and a fast SSD for caching.___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on Linux vs FreeBSD
9:08pm, Stefan Ring wrote: Sorry for not being able to contribute any ZoL experience. I've been pondering whether it's worth trying for a few months myself already. Last time I checked, it didn't support the .zfs directory (for snapshot access), which you really don't want to miss after getting used to it. Actually, rc8 (or was it rc7?) introduced/implemented the .zfs directory. If you're upgrading, you need to reboot, but other than that, it works perfectly. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
2:20pm, Richard Elling wrote: On Apr 25, 2012, at 12:04 PM, Paul Archer wrote: Interesting, something more complex than NFS to avoid the complexities of NFS? ;-) We have data coming in on multiple nodes (with local storage) that is needed on other multiple nodes. The only way to do that with NFS would be with a matrix of cross mounts that would be truly scary. Ignoring lame NFS clients, how is that architecture different than what you would have with any other distributed file system? If all nodes share data to all other nodes, then...? -- richard Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace.___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs (was: Re: ZFS on Linux vs FreeBSD)
2:34pm, Rich Teer wrote: On Wed, 25 Apr 2012, Paul Archer wrote: Simple. With a distributed FS, all nodes mount from a single DFS. With NFS, each node would have to mount from each other node. With 16 nodes, that's what, 240 mounts? Not to mention your data is in 16 different mounts/directory structures, instead of being in a unified filespace. Perhaps I'm being overly simplistic, but in this scenario, what would prevent one from having, on a single file server, /exports/nodes/node[0-15], and then having each node NFS-mount /exports/nodes from the server? Much simplier than your example, and all data is available on all machines/nodes. That assumes the data set will fit on one machine, and that machine won't be a performance bottleneck. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] cluster vs nfs
Tomorrow, Ian Collins wrote: On 04/26/12 10:34 AM, Paul Archer wrote: That assumes the data set will fit on one machine, and that machine won't be a performance bottleneck. Aren't those general considerations when specifying a file server? I suppose. But I meant specifically that our data will not fit on one single machine, and we are relying on spreading it across more nodes to get it on more spindles as well. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
3:08pm, Daniel Carosone wrote: On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote: So I turned deduplication on on my staging FS (the one that gets mounted on the database servers) yesterday, and since then I've been seeing the mount hang for short periods of time off and on. (It lights nagios up like a Christmas tree 'cause the disk checks hang and timeout.) Does it have enough (really, lots) of memory? Do you have an l2arc cache device attached (as well)? Dedup has a significant memory requirement, or it has to go to disk for lots of DDT entries. While its doing that, NFS requests can time out. Lengthening the timeouts on the client (for the fs mounted as a backup destination) might help you around the edges of the problem. As a related issue, are your staging (export) and backup fileystems in the same pool? If they are, moving from staging to final will involve another round of updating lots of DDT entries. What might be worthwhile trying: - turning dedup *off* on the staging filesystem, so NFS isn't waiting for it, and then deduping later as you move to the backup area at leisure (effectively, asynchronously to the nfs writes). - or, perhaps eliminating this double work by writing directly to the main backup fs. Thanks for the info. FWIW, I have turned off dedup on the staging filesystem, but the dedup'ed data is still there, so it's a bit late now. The reason I can't write directly to the main backup FS is that the backup process (RMAN run by my Oracle DBA) writes new files in place, and so my snapshots were taking up 500GB each, vs the 50GB I get if I use rsync instead. I had the dedup turned on on the staging FS so that I could take snapshots of it with dedup and the final FS without dedup (but populated via rsync) to compare which works best. I guess I'll have to wait until I can get some more RAM on the box. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
Yesterday, Erik Trimble wrote: Daniel Carosone wrote: On Wed, Apr 14, 2010 at 08:48:42AM -0500, Paul Archer wrote: So I turned deduplication on on my staging FS (the one that gets mounted on the database servers) yesterday, and since then I've been seeing the mount hang for short periods of time off and on. (It lights nagios up like a Christmas tree 'cause the disk checks hang and timeout.) Does it have enough (really, lots) of memory? Do you have an l2arc cache device attached (as well)? The OP said he had 8GB of RAM, and I suspect that a cheap SSD in the 40-60GB range for L2ARC would actually be the best choice to speed things up in the future, rather than add another 8GB of RAM. I think I'm going to try both. Easier to get one request for upgrades approved than get a second one approved if the first one doesn't cut it. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup screwing up snapshot deletion
3:26pm, Daniel Carosone wrote: On Wed, Apr 14, 2010 at 09:04:50PM -0500, Paul Archer wrote: I realize that I did things in the wrong order. I should have removed the oldest snapshot first, on to the newest, and then removed the data in the FS itself. For the problem in question, this is irrelevant. As discussed in the rest of the thread, you'll hit this when doing anyting that requires updating the ref counts on a large number of DDT entries. The only way snapshot order can really make a big difference is if you arrange for it to do so in advance. If you know you have a large amount of data to delete from a filesystem: - snapshot at the start - start deleting - snapshot fast and frequently during the deletion - let the snapshots go, later, at a controlled pace, to limit the rate of actual block frees. That's a great idea. I wish I had thought of/heard of it before I deleted the data in my dedup'ed FS. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedup causing problems with NFS?(was Re: snapshots taking too much space)
So I turned deduplication on on my staging FS (the one that gets mounted on the database servers) yesterday, and since then I've been seeing the mount hang for short periods of time off and on. (It lights nagios up like a Christmas tree 'cause the disk checks hang and timeout.) I haven't turned dedup off again yet, because I'd like to figure out how to get past this problem. Can anyone give me an idea of why the mounts might be hanging, or where to look for clues? And has anyone had this problem with dedup and NFS before? FWIW, the clients are a mix of Solaris and Linux. Paul Yesterday, Paul Archer wrote: Yesterday, Arne Jansen wrote: Paul Archer wrote: Because it's easier to change what I'm doing than what my DBA does, I decided that I would put rsync back in place, but locally. So I changed things so that the backups go to a staging FS, and then are rsync'ed over to another FS that I take snapshots on. The only problem is that the snapshots are still in the 500GB range. So, I need to figure out why these snapshots are taking so much more room than they were before. This, BTW, is the rsync command I'm using (and essentially the same command I was using when I was rsync'ing from the NetApp): rsync -aPH --inplace --delete /staging/oracle_backup/ /backups/oracle_backup/ Try adding --no-whole-file to rsync. rsync disables block-by-block comparison if used locally by default. Thanks for the tip. I didn't realize rsync had that behavior. It looks like that got my snapshots back to the 50GB range. I'm going to try dedup on the staging FS as well, so I can do a side-by-side of which gives me the better space savings. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedup screwing up snapshot deletion
I have an approx 700GB (of data) FS that I had dedup turned on for. (See previous posts.) I turned on dedup after the FS was populated, and was not sure dedup was working. I had another copy of the data, so I removed the data, and then tried to destroy the snapshots I had taken. The first two didn't take too long, but the last one (the oldest) has taken literally hours now. I've rebooted and tried starting over, but it hasn't made a difference. I realize that I did things in the wrong order. I should have removed the oldest snapshot first, on to the newest, and then removed the data in the FS itself. But still, it shouldn't take hours, should it? I made sure the machine was otherwise idle, and did an 'iostat', which shows about 5KB/sec reads and virtually no writes to the pool. Any ideas where to look? I'd just remove the FS entirely at this point, but I'd have to destroy the snapshot first, so I'm in the same boat, yes? TIA, Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedup screwing up snapshot deletion
7:51pm, Richard Jahnel wrote: This sounds like the known issue about the dedupe map not fitting in ram. When blocks are freed, dedupe scans the whole map to ensure each block is not is use before releasing it. This takes a veeery long time if the map doesn't fit in ram. If you can try adding more ram to the system. -- Thanks for the info. Unfortunately, I'm not sure I'll be able to add more RAM any time soon. But I'm certainly going to try, as this is the primary backup server for our Oracle databases. Thanks again, Paul PS It's got 8GB right now. You think doubling that to 16GB would cut it? Is there a way to see how big the map is, anyway? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snapshots taking too much space
Yesterday, Arne Jansen wrote: Paul Archer wrote: Because it's easier to change what I'm doing than what my DBA does, I decided that I would put rsync back in place, but locally. So I changed things so that the backups go to a staging FS, and then are rsync'ed over to another FS that I take snapshots on. The only problem is that the snapshots are still in the 500GB range. So, I need to figure out why these snapshots are taking so much more room than they were before. This, BTW, is the rsync command I'm using (and essentially the same command I was using when I was rsync'ing from the NetApp): rsync -aPH --inplace --delete /staging/oracle_backup/ /backups/oracle_backup/ Try adding --no-whole-file to rsync. rsync disables block-by-block comparison if used locally by default. Thanks for the tip. I didn't realize rsync had that behavior. It looks like that got my snapshots back to the 50GB range. I'm going to try dedup on the staging FS as well, so I can do a side-by-side of which gives me the better space savings. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] snapshots taking too much space
I've got a bit of a strange problem with snapshot sizes. First, some background: For ages our DBA backed up all the company databases to a directory NFS mounted from a NetApp filer. That directory would then get dumped to tape. About a year ago, I built an OpenSolaris (technically Nexenta) machine with 24 x 1.5TB drives, for about 24TB of usable space. I am using this to backup OS images using backuppc. I was also backing up the DBA's backup volume from the NetApp to the (ZFS) backup server. This is a combination of rsync + snapshots. The snapshots were using about 50GB/day. The backup volume is about 600GB total, so this wasn't bad, especially on a box with 24TB of space available. I decided to cut out the middleman, and save some of that expensive NetApp disk space, by having the DBA backup directly to the backup server. I repointed the NFS mounts on our DB servers to point to the backup server instead of the NetApp. Then I ran a simple cron job to snapshot that ZFS filesystem daily. My problem is that the snapshots started taking around 500GB instead of 50GB. After a bit of thinking, I realized that the backup system my DBA was using must have been writing new files and moving them into place, or possibly writing a whole new file even if only part changed. I think this is the problem because ZFS never overwrites files in place. Instead it would allocate new blocks. But rsync does a byte-by-byte comparison, and only updates the blocks that have changed. Because it's easier to change what I'm doing than what my DBA does, I decided that I would put rsync back in place, but locally. So I changed things so that the backups go to a staging FS, and then are rsync'ed over to another FS that I take snapshots on. The only problem is that the snapshots are still in the 500GB range. So, I need to figure out why these snapshots are taking so much more room than they were before. This, BTW, is the rsync command I'm using (and essentially the same command I was using when I was rsync'ing from the NetApp): rsync -aPH --inplace --delete /staging/oracle_backup/ /backups/oracle_backup/ This is the old system (rsync'ing from a NetApp and taking snapshots): zfs list -t snapshot -r bpool/snapback NAME USED AVAIL REFER MOUNTPOINT ... bpool/snapb...@20100310-18271353.7G - 868G - bpool/snapb...@20100312-00031859.8G - 860G - bpool/snapb...@20100312-18255254.0G - 840G - bpool/snapb...@20100313-18483471.7G - 884G - bpool/snapb...@20100314-12302417.5G - 832G - bpool/snapb...@20100315-17360972.6G - 891G - bpool/snapb...@20100316-16552724.3G - 851G - bpool/snapb...@20100317-17130456.2G - 884G - bpool/snapb...@20100318-17025050.9G - 865G - bpool/snapb...@20100319-18113153.9G - 874G - bpool/snapb...@20100320-18361780.8G - 902G - ... This is from the new system (backing up directly to one volume, rsync'ing to and snapshotting another one): r...@backup02:~# zfs list -t snapshot -r bpool/backups/oracle_backup NAME USED AVAIL REFER MOUNTPOINT bpool/backups/oracle_bac...@20100411-023130 479G - 681G - bpool/backups/oracle_bac...@20100411-104428 515G - 721G - bpool/backups/oracle_bac...@20100412-144700 0 - 734G - Thanks for any help, Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs inotify?
OK, so this may be a little off-topic, but here goes: The reason I switched to OpenSolaris was primarily to take advantage of ZFS's features when storing my digital imaging collection. I switched from a pretty stock Linux setup, but it left me at one disadvantage. I had been using inotify under Linux to trigger a series of Ruby scripts that would do all the basic ingestion/setup for me (renaming files, converting to DNG, adding bulk metadata). The scripts will run under OpenSolaris, except for the inotify part. Question: Is there a facility similar to inotify that I can use to monitor a directory structure in OpenSolaris/ZFS, such that it will block until a file is modified (added, deleted, etc), and then pass the state along (STDOUT is fine)? One other requirement: inotify can handle subdirectories being added on the fly. So if you use it to monitor, for example, /data/images/incoming, and a /data/images/incoming/100canon directory gets created, then the files under that directory will automatically be monitored as well. Thanks, Paul Archer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs inotify?
5:12pm, Cyril Plisko wrote: Question: Is there a facility similar to inotify that I can use to monitor a directory structure in OpenSolaris/ZFS, such that it will block until a file is modified (added, deleted, etc), and then pass the state along (STDOUT is fine)? One other requirement: inotify can handle subdirectories being added on the fly. So if you use it to monitor, for example, /data/images/incoming, and a /data/images/incoming/100canon directory gets created, then the files under that directory will automatically be monitored as well. while there is no inotify for Solaris, there are similar technologies available. Check port_create(3C) and gam_server(1) I can't find much on gam_server on Solaris (couldn't find too much on it at all, really), and port_create is apparently a system call. (I'm not a developer--if I can't write it in BASH, Perl, or Ruby, I can't write it.) I appreciate the suggestions, but I need something a little more pret-a-porte. Does anyone have any dtrace experience? I figure this could probably be done with dtrace, but I don't know enough about it to write a dtrace script (although I may learn if that turns out to be the best way to go). I was hoping that there'd be a script out there already, but I haven't turned up anything yet. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedup video
Someone posted this link: https://slx.sun.com/1179275620 for a video on ZFS deduplication. But the site isn't responding (which is typical of Sun, since I've been dealing with them for the last 12 years). Does anyone know of a mirror site, or if the video is on YouTube? Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
9:51am, Ware Adams wrote: On Sep 29, 2009, at 9:32 AM, p...@paularcher.org wrote: I am using an SC846xxx for a project here at work. The hardware consists of an ASUS server-level motherboard with 2 quad-core Xeons, 8GB of RAM, an LSI PCI-e SAS/SATA card, and 24 1.5TB HD, all in one of these cases. The drives are in one pool with 3x 7+1 raid-z sets. Raw is 32TB, usable is about 24TB. Total price was about $6000. (It'd be about $800 less now that 1.5TB drives have dropped in price.) If I can go with something like this it's going to be the easiest way to get lots of drives. Do you have this outside of a server room? Would the noise be manageable if say it were mounted in an enclosed rack with sound deadening? It's in a server room, but I had it here in the office while I was putting it together. The case really isn't too loud. 24 hard drives make a fair bit of noise--but I think if you had it in a closet with some soundproofing, it wouldn't be bad. And if you went with a smaller enclosure (12 drives, for instance) that would help. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Comments on home OpenSolaris/ZFS server
You don't like http://www.supermicro.com/products/nfo/chassis_storage.cfm ? I must admit I don't have a price list of these. I am using an SC846xxx for a project here at work. The hardware consists of an ASUS server-level motherboard with 2 quad-core Xeons, 8GB of RAM, an LSI PCI-e SAS/SATA card, and 24 1.5TB HD, all in one of these cases. The drives are in one pool with 3x 7+1 raid-z sets. Raw is 32TB, usable is about 24TB. Total price was about $6000. (It'd be about $800 less now that 1.5TB drives have dropped in price.) I built it for disk to disk backups. Right now, I'm using backuppc for backing up the OS'es of our DB servers and such, and rsync and snapshots for the databases themselves. I get about 50MB/sec read and write speeds, but I think that's because the version of the SC846 I got has a single backplane for the SAS/SATA drives, and one connector to the LSI card. Of course, for what I'm doing, that's fine. Paul Oh, I think the SC846 I got was about $1100. http://www.cdw.com/shop/search/results.aspx?key=sc846searchscope=Allsr=1Find+it.x=0Find+it.y=0 One thing I forgot to mention: there is a wart with this case. The connectors for the low-profile CDROM drive are too short, and the power connector for the internal drive hits the lid of the case. I actually had to find a low-profile molex power connector for the hard drive, and I can only use the CDROM drive if I open the case up and loosen the internal hard drive so I can plug the CDROM in. Otherwise, though, the case is very well built. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Yesterday, Paul Archer wrote: I estimate another 10-15 hours before this disk is finished resilvering and the zpool is OK again. At that time, I'm going to switch some hardware out (I've got a newer and higher-end LSI card that I hadn't used before because it's PCI-X, and won't fit on my current motherboard.) I'll report back what I get with it tomorrow or the next day, depending on the timing on the resilver. Paul Archer And the hits just keep coming... The resilver finished last night, so rebooted the box as I had just upgraded to the latest Dev build. Not only did the upgrade fail (love that instant rollback!), but now the zpool won't come online: r...@shebop:~# zpool import pool: datapool id: 3410059226836265661 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: datapool UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data c7d0 ONLINE c8d0s0 ONLINE c9d0s0 ONLINE c11d0s0 ONLINE c10d0s0 ONLINE I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. Is it OK to scream and tear my hair out now? Paul PS I don't suppose there's an RFE out there for give useful data when a pool is unavailable. Or even better, allow a pool to be imported (but no filesystems mounted) so it *can be fixed*. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
8:30am, Paul Archer wrote: And the hits just keep coming... The resilver finished last night, so rebooted the box as I had just upgraded to the latest Dev build. Not only did the upgrade fail (love that instant rollback!), but now the zpool won't come online: r...@shebop:~# zpool import pool: datapool id: 3410059226836265661 state: UNAVAIL status: The pool is formatted using an older on-disk version. action: The pool cannot be imported due to damaged devices or data. config: datapool UNAVAIL insufficient replicas raidz1 UNAVAIL corrupted data c7d0 ONLINE c8d0s0 ONLINE c9d0s0 ONLINE c11d0s0 ONLINE c10d0s0 ONLINE I've tried renaming /etc/zfs/zpool.cache and rebooting, but no joy. Is it OK to scream and tear my hair out now? A little more research came up with this: r...@shebop:~# zdb -l /dev/dsk/c7d0 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 failed to unpack label 2 LABEL 3 failed to unpack label 3 While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool replace' because the zpool isn't online. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
7:56pm, Victor Latushkin wrote: While 'zdb -l /dev/dsk/c7d0s0' shows normal labels. So the new question is: how do I tell ZFS to use c7d0s0 instead of c7d0? I can't do a 'zpool replace' because the zpool isn't online. ZFS actually uses c7d0s0 and not c7d0 - it shortens output to c7d0 in case it controls entire disk. As before upgrade it looked like this: NAMESTATE READ WRITE CKSUM datapoolONLINE 0 0 0 raidz1ONLINE 0 0 0 c2d0s0 ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 c4d0s0 ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 I guess something happened to the labeling of disk c7d0 (used to be c2d0) before, during or after upgrade. It would be nice to show what zdb -l shows for this disk and some other disk too. output of 'prtvtoc /dev/rdsk/cXdYs0' can be helpful too. This is from c7d0: LABEL 0 version=13 name='datapool' state=0 txg=233478 pool_guid=3410059226836265661 hostid=519305 hostname='shebop' top_guid=7679950824008134671 guid=17458733222130700355 vdev_tree type='raidz' id=0 guid=7679950824008134671 nparity=1 metaslab_array=23 metaslab_shift=32 ashift=9 asize=7501485178880 is_log=0 children[0] type='disk' id=0 guid=17458733222130700355 path='/dev/dsk/c7d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742049/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@0/i...@1/c...@0,0:a' whole_disk=1 DTL=588 children[1] type='disk' id=1 guid=4735756507338772729 path='/dev/dsk/c8d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742050/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@0/c...@0,0:a' whole_disk=0 DTL=467 children[2] type='disk' id=2 guid=10113358996255761229 path='/dev/dsk/c9d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742059/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@1/i...@1/c...@0,0:a' whole_disk=0 DTL=573 children[3] type='disk' id=3 guid=11460855531791764612 path='/dev/dsk/c11d0s0' devid='id1,c...@asamsung_hd154ui=s1y6j1ks742048/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@1/c...@0,0:a' whole_disk=0 DTL=571 children[4] type='disk' id=4 guid=14986691153111294171 path='/dev/dsk/c10d0s0' devid='id1,c...@ast31500341as=9vs0ttwf/a' phys_path='/p...@0,0/pci10de,3...@4/pci8086,3...@7/pci-...@2/i...@0/c...@0,0:a' whole_disk=0 DTL=473 Labels 1-3 are identical The other disks in the pool give identical results (except for the guid's, which match with what's above). c8d0 - c11d0 are identical, so I didn't include that output below: r...@shebop:/tmp# prtvtoc /dev/rdsk/c7d0s0 * /dev/rdsk/c7d0s0 partition map * * Dimensions: * 512 bytes/sector * 2930264064 sectors * 2930263997 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First SectorLast * Sector CountSector * 34 222 255 * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400256 2930247391 2930247646 8 1100 2930247647 16384 2930264030 r...@shebop:/tmp# r...@shebop:/tmp# prtvtoc /dev/rdsk/c8d0s0 * /dev/rdsk/c8d0s0 partition map * * Dimensions: * 512 bytes/sector * 2930264064 sectors * 2930277101 accessible sectors * * Flags: * 1: unmountable * 10: read-only * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 1700 34 2930277101 2930277134 Thanks for the help! Paul Archer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
In light of all the trouble I've been having with this zpool, I bought a 2TB drive, and I'm going to move all my data over to it, then destroy the pool and start over. Before I do that, what is the best way on an x86 system to format/label the disks? Thanks, Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Cool. FWIW, there appears to be an issue with the LSI 150-6 card I was using. I grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, and I'm getting write speeds of about 60-70MB/sec, which is about 40x the write speed I was seeing with the old card. Paul Tomorrow, Robert Milkowski wrote: Paul Archer wrote: In light of all the trouble I've been having with this zpool, I bought a 2TB drive, and I'm going to move all my data over to it, then destroy the pool and start over. Before I do that, what is the best way on an x86 system to format/label the disks? if entire disk is going to be dedicated to a one zfs pool then don't bother with manual labeling - when creating a pool provide a disk name without a slice name (so for example c0d0 instead of c0d0s0) and zfs will automatically put an EFI label on it with s0 representing entire disk (- reserved area). -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
11:04pm, Paul Archer wrote: Cool. FWIW, there appears to be an issue with the LSI 150-6 card I was using. I grabbed an old server m/b from work, and put a newer PCI-X LSI card in it, and I'm getting write speeds of about 60-70MB/sec, which is about 40x the write speed I was seeing with the old card. Paul Small correction: I was seeing writes in the 60-70MB range because I was writing to a single 2TB (on its own pool). When I tried writing back to the primary (4+1 raid-z) pool, I was getting between 100-120MB/sec. (That's for sequential writes, anyway.) paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Problem is that while it's back, the performance is horrible. It's resilvering at about (according to iostat) 3.5MB/sec. And at some point, I was zeroing out the drive (with 'dd if=/dev/zero of=/dev/dsk/c7d0'), and iostat showed me that the drive was only writing at around 3.5MB/sec. *And* it showed reads of about the same 3.5MB/sec even during the dd. This same hardware and even the same zpool have been run under linux with zfs-fuse and BSD, and with BSD at least, performance was much better. A complete resilver under BSD took 6 hours. Right now zpool is estimating this resilver to take 36. Could this be a driver problem? Something to do with the fact that this is a very old SATA card (LSI 150-6)? This is driving me crazy. I finally got my zpool working under Solaris so I'd have some stability, and I've got no performance. It appears your controller is preventing ZFS from enabling write cache. I'm not familiar with that model. You will need to find a way to enable the drives write cache manually. My controller, while normally a full RAID controller, has had its BIOS turned off, so it's acting as a simple SATA controller. Plus, I'm seeing this same slow performance with dd, not just with ZFS. And I wouldn't think that write caching would make a difference with using dd (especially writing in from /dev/zero). The other thing that's weird is the writes. I am seeing writes in that 3.5MB/sec range during the resilver, *and* I was seeing the same thing during the dd. This is from the resilver, but again, the dd was similar. c7d0 is the device in question: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 238.00.0 476.0 0.0 1.00.04.1 0 99 c12d1 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 80.40.0 3417.60.0 0.3 0.33.33.2 8 14 c8d0 80.40.0 3417.60.0 0.3 0.33.43.2 9 14 c9d0 80.60.0 3417.60.0 0.3 0.33.43.2 9 14 c10d0 80.60.0 3417.60.0 0.3 0.33.33.1 9 14 c11d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c12t0d0 Paul Archer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
My controller, while normally a full RAID controller, has had its BIOS turned off, so it's acting as a simple SATA controller. Plus, I'm seeing this same slow performance with dd, not just with ZFS. And I wouldn't think that write caching would make a difference with using dd (especially writing in from /dev/zero). I don't think you got what I said. Because the controller normally runs as a RAID controller the controller controls the SATA drives' on-board write cache, it may not allow the OS to enable/disable the drives' on-board write cache. I see what you're saying. I just think that with the BIOS turned off, this card is essentially acting like a dumb SATA controller, and therefore not doing anything with the drives' cache. Using 'dd' to the raw disk will also show the same poor performance if the HD on-board write-cache is disabled. The other thing that's weird is the writes. I am seeing writes in that 3.5MB/sec range during the resilver, *and* I was seeing the same thing during the dd. Was the 'dd' to the raw disk? Either was it shows the HDs aren't setup properly. This is from the resilver, but again, the dd was similar. c7d0 is the device in question: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 238.00.0 476.0 0.0 1.00.04.1 0 99 c12d1 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 80.40.0 3417.60.0 0.3 0.33.33.2 8 14 c8d0 80.40.0 3417.60.0 0.3 0.33.43.2 9 14 c9d0 80.60.0 3417.60.0 0.3 0.33.43.2 9 14 c10d0 80.60.0 3417.60.0 0.3 0.33.33.1 9 14 c11d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c12t0d0 Try using 'format -e' on the drives, go into 'cache' then 'write-cache' and display the current state. You can try to manually enable it from there. I tried this, but the 'cache' menu item didn't show up. The man page says it only works for SCSI disks. Do you know of any other way to get/set those parameters? Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
1:19pm, Richard Elling wrote: The other thing that's weird is the writes. I am seeing writes in that 3.5MB/sec range during the resilver, *and* I was seeing the same thing during the dd. This is from the resilver, but again, the dd was similar. c7d0 is the device in question: r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 238.00.0 476.0 0.0 1.00.04.1 0 99 c12d1 30.8 37.8 3302.4 3407.2 14.1 2.0 206.0 29.2 100 100 c7d0 This is the bottleneck. 29.2 ms average service time is slow. As you can see, this causes a backup in the queue, which is seeing an average service time of 206 ms. The problem could be the disk itself or anything in the path to that disk, including software. But first, look for hardware issues via iostat -E fmadm faulty fmdump -eV I don't see anything in the output of these commands except for the ZFS errors from when I was trying to get the disk online and resilvered. I estimate another 10-15 hours before this disk is finished resilvering and the zpool is OK again. At that time, I'm going to switch some hardware out (I've got a newer and higher-end LSI card that I hadn't used before because it's PCI-X, and won't fit on my current motherboard.) I'll report back what I get with it tomorrow or the next day, depending on the timing on the resilver. Paul Archer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] extremely slow writes (with good reads)
Since I got my zfs pool working under solaris (I talked on this list last week about moving it from linux bsd to solaris, and the pain that was), I'm seeing very good reads, but nada for writes. Reads: r...@shebop:/data/dvds# rsync -aP young_frankenstein.iso /tmp sending incremental file list young_frankenstein.iso ^C1032421376 20% 86.23MB/s0:00:44 Writes: r...@shebop:/data/dvds# rsync -aP /tmp/young_frankenstein.iso yf.iso sending incremental file list young_frankenstein.iso ^C 68976640 6%2.50MB/s0:06:42 This is pretty typical of what I'm seeing. r...@shebop:/data/dvds# zpool status -v pool: datapool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAMESTATE READ WRITE CKSUM datapoolONLINE 0 0 0 raidz1ONLINE 0 0 0 c2d0s0 ONLINE 0 0 0 c3d0s0 ONLINE 0 0 0 c4d0s0 ONLINE 0 0 0 c6d0s0 ONLINE 0 0 0 c5d0s0 ONLINE 0 0 0 errors: No known data errors pool: syspool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM syspool ONLINE 0 0 0 c0d1s0ONLINE 0 0 0 errors: No known data errors (This is while running an rsync from a remote machine to a ZFS filesystem) r...@shebop:/data/dvds# iostat -xn 5 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 11.14.8 395.8 275.9 5.8 0.1 364.74.3 2 5 c0d1 9.8 10.9 514.3 346.4 6.8 1.4 329.7 66.7 68 70 c5d0 9.8 10.9 516.6 346.4 6.7 1.4 323.1 66.2 67 70 c6d0 9.7 10.9 491.3 346.3 6.7 1.4 324.7 67.2 67 70 c3d0 9.8 10.9 519.9 346.3 6.8 1.4 326.7 67.2 68 71 c4d0 9.8 11.0 493.5 346.6 3.6 0.8 175.3 37.9 38 41 c2d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c0t0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.00.00.00.0 0.0 0.00.00.0 0 0 c0d1 64.6 12.6 8207.4 382.1 32.8 2.0 424.7 25.9 100 100 c5d0 62.2 12.2 7203.2 370.1 27.9 2.0 375.1 26.7 99 100 c6d0 53.2 11.8 5973.9 390.2 25.9 2.0 398.8 30.5 98 99 c3d0 49.4 10.6 5398.2 389.8 30.2 2.0 503.7 33.3 99 100 c4d0 45.2 12.8 5431.4 337.0 14.3 1.0 247.3 17.9 52 52 c2d0 0.00.00.00.0 0.0 0.00.00.0 0 0 c0t0d0 Any ideas? Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] extremely slow writes (with good reads)
Oh, for the record, the drives are 1.5TB SATA, in a 4+1 raidz-1 config. All the drives are on the same LSI 150-6 PCI controller card, and the M/B is a generic something or other with a triple-core, and 2GB RAM. Paul 3:34pm, Paul Archer wrote: Since I got my zfs pool working under solaris (I talked on this list last week about moving it from linux bsd to solaris, and the pain that was), I'm seeing very good reads, but nada for writes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] moving files from one fs to another, splittin/merging
I may have missed something in the docs, but if I have a file in one FS, and want to move it to another FS (assuming both filesystems are on the same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync commands? For example, I have a pool with my home directory as a FS, and I have another FS with ISOs. I download an ISO of an OpenSolaris DVD (say, 3GB), but it goes into my home directory. Since ZFS is all about pools and shared storage, it seems like it would be natural to move the file vi a 'zfs' command, rather mv/cp/etc... On a related(?) note, is there a way to split an existing filesystem? To use the example above, let's say I have an ISO directory in my home directory, but it's getting big, plus I'd like to share it out on my network. Is there a way to split my home directory's FS, so that the ISO directory becomes its own FS? Paul Archer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] moving files from one fs to another, splittin/merging
Thanks for the info. Glad to hear it's in the works, too. Paul 1:21pm, Mark J Musante wrote: On Thu, 24 Sep 2009, Paul Archer wrote: I may have missed something in the docs, but if I have a file in one FS, and want to move it to another FS (assuming both filesystems are on the same ZFS pool), is there a way to do it outside of the standard mv/cp/rsync commands? Not yet. CR 6483179 covers this. On a related(?) note, is there a way to split an existing filesystem? Not yet. CR 6400399 covers this. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SOLVED: Re: migrating from linux to solaris ZFS
Thursday, Paul Archer wrote: Tomorrow, Fajar A. Nugraha wrote: There was a post from Ricardo on zfs-fuse list some time ago. Apparently if you do a zpool create on whole disks, Linux on Solaris behaves differently: - solaris will create EFI partition on that disk, and use the partition as vdev - Linux will use the whole disk without any partition, just like with a file-based vdev. The result is that you might be unable to import the pool on *solaris or *BSD. The recommended way to create a portable pool is to create the pool on a partition setup recognizable on all those OS. He suggested a simple DOS/MBR partition table. So in short, if you had created the pool on top of sda1 instead of sda, it will work. I'm surprised though that you can offlined sda and replaced it with sda1 when previously you said I see that if I try to replace sda with sda1, zpool complains that sda1 is too small I was a bit surprised about that, too. But I found that a standard PC/Linux partition reserves around 24MB at the beginning of the disk, and an EFI (or actually, GPT) disklabel and partition only uses a few 100KB. As I mentioned above, I created GPT disklabels and partitions on all my disks, then one-by-one offlined the disk and replaced it with the partition from the same disk (eg 'zpool replace datapool ad1 ad1p1'). I did the first replacement with Linux and zfs-fuse. The resilver took 32 hours. I did the rest in FreeBSD, which took 5-6 hours for each disk. It was tedious, but the pool is available in Solaris (finally!), so hopefully no more NFS issues or kernel panics. (I had NFS issues with both Linux and BSD, and kernel panics with BSD.) Paul PS. Complicating matters was the fact that for some reason, BSD didn't like my LSI 150-6 SATA card (which is the only one Solaris plays nice with), so I had to keep switching cards every time I went from one OS to the other. Blech. OTOH, here's to Live CDs! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] migrating from linux to solaris ZFS
I recently (re)built a fileserver at home, using Ubuntu and zfs-fuse to create a ZFS filesystem (RAIDz1) on five 1.5TB drives. I had some serious issues with NFS not working properly (kept getting stale file handles), so I tried to switch to OpenSolaris/Nexenta, but my SATA controller wasn't supported. I went to FreeBSD, and got ZFS working there, and was able to import the ZFS pool that I had created under Linux and zfs-fuse. But I had issues with kernel panics. Finally, I found a SATA card that would work with Solaris (an old LSI 150-6). I upgraded the firmware and turned off the BIOS (so it would act as a plain SATA card, rather than doing RAID), and I could finally access the drives under Solaris. Now my problem is that even though Solaris can see the drives, and recognizes that I have a ZFS pool, it won't import it. This isn't a case of using -f to force the import. Rather, even though the drives are all online and showing as available, 'zpool import' says I have insufficient replicas and that the raidz is unavailable due to corrupted data. (I can post screen caps later today.) I can reboot into Linux and import the pools, but haven't figured out why I can't import them in Solaris. I don't know if it makes a difference (I wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, where Nexenta is using version 14. Any ideas? Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] migrating from linux to solaris ZFS
10:09pm, Fajar A. Nugraha wrote: On Thu, Sep 17, 2009 at 8:55 PM, Paul Archer p...@paularcher.org wrote: I can reboot into Linux and import the pools, but haven't figured out why I can't import them in Solaris. I don't know if it makes a difference (I wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, where Nexenta is using version 14. Just a guess, but did you use the whole drive while creating the pool on Linux? Something like zpool create poolname raidz sda sdb sdc ? Yes, I did. I was under the impression that was the way to go. If it's not (ie it needs to be a single disk-sized partion), I can try moving. I'm assuming if I add a partition, I can do something like: zpool replace datapool sda sda1 Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] migrating from linux to solaris ZFS
10:40am, Paul Archer wrote: I can reboot into Linux and import the pools, but haven't figured out why I can't import them in Solaris. I don't know if it makes a difference (I wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, where Nexenta is using version 14. Just a guess, but did you use the whole drive while creating the pool on Linux? Something like zpool create poolname raidz sda sdb sdc ? Yes, I did. I was under the impression that was the way to go. If it's not (ie it needs to be a single disk-sized partion), I can try moving. I'm assuming if I add a partition, I can do something like: zpool replace datapool sda sda1 Or not. I see that if I try to replace sda with sda1, zpool complains that sda1 is too small. Any suggestions (that don't include 'start over')? Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] migrating from linux to solaris ZFS
5:08pm, Darren J Moffat wrote: Paul Archer wrote: 10:09pm, Fajar A. Nugraha wrote: On Thu, Sep 17, 2009 at 8:55 PM, Paul Archer p...@paularcher.org wrote: I can reboot into Linux and import the pools, but haven't figured out why I can't import them in Solaris. I don't know if it makes a difference (I wouldn't think so), but zfs-fuse under Linux is using ZFS version 13, where Nexenta is using version 14. Just a guess, but did you use the whole drive while creating the pool on Linux? Something like zpool create poolname raidz sda sdb sdc ? Yes, I did. I was under the impression that was the way to go. If it's not (ie it needs to be a single disk-sized partion), I can try moving. I'm assuming if I add a partition, I can do something like: zpool replace datapool sda sda1 What kind of partition table is on the disks, is it EFI ? If not that might be part of the issue. I don't believe there is any partition table on the disks. I pointed zfs to the raw disks when I setup the pool. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] migrating from linux to solaris ZFS
6:44pm, Darren J Moffat wrote: Paul Archer wrote: What kind of partition table is on the disks, is it EFI ? If not that might be part of the issue. I don't believe there is any partition table on the disks. I pointed zfs to the raw disks when I setup the pool. If you run fdisk on OpenSolaris against this disk what does it show as the partition type eg: fdisk -v /dev/rdsk/c7t4d0p0 Mine shows: 1 EFI 0 4560045601100 Which tells me I have an EFI label on the disk. My boot ZFS pool shows this: one one side of the mirror: 1 Diagnostic0 3 4 0 2 ActiveSolaris2 4 4559945596100 and on the other: 1 ActiveSolaris2 1 4559945599100 -- I just took a look, and it seems that all the drives have a single partition on them. I'm looking under Linux, as I can't reboot it into Solaris again until I get home tonight. r...@ubuntu:~# fdisk -l /dev/sda Disk /dev/sda: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xce13f90b Device Boot Start End Blocks Id System /dev/sda1 1 182401 1465136001 83 Linux ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] migrating from linux to solaris ZFS
7:37pm, Darren J Moffat wrote: Paul Archer wrote: r...@ubuntu:~# fdisk -l /dev/sda Disk /dev/sda: 1500.3 GB, 1500301910016 bytes 255 heads, 63 sectors/track, 182401 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0xce13f90b Device Boot Start End Blocks Id System /dev/sda1 1 182401 1465136001 83 Linux That is good enough. That is your problem right there. Solaris doesn't recognise this partition type. FreeBSD I think does. I'm not sure what you can do to get Solaris to recognise this. If there is a non destructive way under Linux to change this to an EFI partition that would be a good way to start. I doubt that simply changing the tag from Linux (32) to Solaris2 (191) would be enough since you would lack the vtoc in there. Plus ideally you want this as EFI unless you need to put OpenSolaris into that pool to boot from it - but sounds like you don't. I did a little research and found that parted on Linux handles EFI labelling. I used it to change the partition scheme on sda, creating an sda1. I then offlined sda and replaced it with sda1. I wish I had just tried a scrub instead of the replace, though, as I've gotta wait about 35 hours for the resilver to finish. (1.5TB data on five disks with a single PCI controller card.) Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] migrating from linux to solaris ZFS
Tomorrow, Fajar A. Nugraha wrote: There was a post from Ricardo on zfs-fuse list some time ago. Apparently if you do a zpool create on whole disks, Linux on Solaris behaves differently: - solaris will create EFI partition on that disk, and use the partition as vdev - Linux will use the whole disk without any partition, just like with a file-based vdev. The result is that you might be unable to import the pool on *solaris or *BSD. The recommended way to create a portable pool is to create the pool on a partition setup recognizable on all those OS. He suggested a simple DOS/MBR partition table. So in short, if you had created the pool on top of sda1 instead of sda, it will work. I'm surprised though that you can offlined sda and replaced it with sda1 when previously you said I see that if I try to replace sda with sda1, zpool complains that sda1 is too small I was a bit surprised about that, too. But I found that a standard PC/Linux partition reserves around 24MB at the beginning of the disk, and an EFI (or actually, GPT) disklabel and partition only uses a few 100KB. Paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss