Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Jim Klimov

On 2013-03-19 22:07, Andrew Gabriel wrote:

The GPT partitioning spec requires the disk to be FDISK
partitioned with just one single FDISK partition of type EFI,
so that tools which predate GPT partitioning will still see
such a GPT disk as fully assigned to FDISK partitions, and
therefore less likely to be accidentally blown away.


Okay, I guess I got entangled in terminology now ;)
Anyhow, your words are not all news to me, though my write-up
was likely misleading to unprepared readers... sigh... Thanks
for the clarifications and deeper details that I did not know!

So, we can concur that GPT does indeed include the fake MBR
header with one EFI partition which addresses the smaller of
2TB (MBR limit) or disk size, minus a few sectors for the GPT
housekeeping. Inside the EFI partition are defined the GPT,
um, partitions (represented as "s"lices in Solaris). This is
after all a GUID *Partition* Table, and that's how parted
refers to them too ;)

Notably, there are also unportable tricks to fool legacy OSes
and bootloaders into addressing the same byte ranges via both
MBR entries (forged manually and abusing the GPT/EFI spec) and
proper GPT entries, as partitions in the sense of each table.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Andrew Gabriel

On 03/19/13 20:27, Jim Klimov wrote:

I disagree; at least, I've always thought differently:
the "d" device is the whole disk denomination, with a
unique number for a particular controller link ("c+t").

The disk has some partitioning table, MBR or GPT/EFI.
In these tables, partition "p0" stands for the table
itself (i.e. to manage partitioning),


p0 is the whole disk regardless of any partitioning.
(Hence you can use p0 to access any type of partition table.)


and the rest kind
of "depends". In case of MBR tables, one partition may
be named as having a Solaris (or Solaris2) type, and
there it holds a SMI table of Solaris slices, and these
slices can hold legacy filesystems or components of ZFS
pools. In case of GPT, the GPT-partitions can be used
directly by ZFS. However, they are also denominated as
"slices" in ZFS and format utility.


The GPT partitioning spec requires the disk to be FDISK
partitioned with just one single FDISK partition of type EFI,
so that tools which predate GPT partitioning will still see
such a GPT disk as fully assigned to FDISK partitions, and
therefore less likely to be accidentally blown away.


I believe, Solaris-based OSes accessing a "p"-named
partition and an "s"-named slice of the same number
on a GPT disk should lead to the same range of bytes
on disk, but I am not really certain about this.


No, you'll see just p0 (whole disk), and p1 (whole disk
less space for the backwards compatible FDISK partitioning).


Also, if a "whole disk" is given to ZFS (and for OSes
other that the latest Solaris 11 this means non-rpool
disks), then ZFS labels the disk as GPT and defines a
partition for itself plus a small trailing partition
(likely to level out discrepancies with replacement
disks that might happen to be a few sectors too small).
In this case ZFS reports that it uses "cXtYdZ" as a
pool component,


For an EFI disk, the device name without a final p* or s*
component is the whole EFI partition. (It's actually the
s7 slice minor device node, but the s7 is dropped from
the device name to avoid the confusion we had with s2
on SMI labeled disks being the whole SMI partition.)


since it considers itself in charge
of the partitioning table and its inner contents, and
doesn't intend to share the disk with other usages
(dual-booting and other OSes' partitions, or SLOG and
L2ARC parts, etc). This also "allows" ZFS to influence
hardware-related choices, like caching and throttling,
and likely auto-expansion with the changed LUN sizes
by fixing up the partition table along the way, since
it assumes being 100% in charge of the disk.

I don't think there is a "crime" in trying to use the
partitions (of either kind) as ZFS leaf vdevs, even the
zpool(1M) manpage states that:

... The  following  virtual  devices  are supported:
  disk
A block device, typically located under  /dev/dsk.
ZFS  can  use  individual  slices  or  partitions,
though the recommended mode of operation is to use
whole  disks.  ...


Right.


This is orthogonal to the fact that there can only be
one Solaris slice table, inside one partition, on MBR.
AFAIK this is irrelevant on GPT/EFI - no SMI slices there.


There's a simpler way to think of it on x86.
You always have FDISK partitioning (p1, p2, p3, p4).
You can then have SMI or GPT/EFI slices (both called s0, s1, ...)
in an FDISK partition of the appropriate type.
With SMI labeling, s2 is by convention the whole Solaris FDISK
partition (although this is not enforced).
With EFI labeling, s7 is enforced as the whole EFI FDISK partition,
and so the trailing s7 is dropped off the device name for
clarity.

This simplicity is brought about because the GPT spec requires
that backwards compatible FDISK partitioning is included, but
with just 1 partition assigned.

--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Jim Klimov

On 2013-03-19 20:38, Cindy Swearingen wrote:

Hi Andrew,

Your original syntax was incorrect.

A p* device is a larger container for the d* device or s* devices.
In the case of a cache device, you need to specify a d* or s* device.
That you can add p* devices to a pool is a bug.


I disagree; at least, I've always thought differently:
the "d" device is the whole disk denomination, with a
unique number for a particular controller link ("c+t").

The disk has some partitioning table, MBR or GPT/EFI.
In these tables, partition "p0" stands for the table
itself (i.e. to manage partitioning), and the rest kind
of "depends". In case of MBR tables, one partition may
be named as having a Solaris (or Solaris2) type, and
there it holds a SMI table of Solaris slices, and these
slices can hold legacy filesystems or components of ZFS
pools. In case of GPT, the GPT-partitions can be used
directly by ZFS. However, they are also denominated as
"slices" in ZFS and format utility.

I believe, Solaris-based OSes accessing a "p"-named
partition and an "s"-named slice of the same number
on a GPT disk should lead to the same range of bytes
on disk, but I am not really certain about this.

Also, if a "whole disk" is given to ZFS (and for OSes
other that the latest Solaris 11 this means non-rpool
disks), then ZFS labels the disk as GPT and defines a
partition for itself plus a small trailing partition
(likely to level out discrepancies with replacement
disks that might happen to be a few sectors too small).
In this case ZFS reports that it uses "cXtYdZ" as a
pool component, since it considers itself in charge
of the partitioning table and its inner contents, and
doesn't intend to share the disk with other usages
(dual-booting and other OSes' partitions, or SLOG and
L2ARC parts, etc). This also "allows" ZFS to influence
hardware-related choices, like caching and throttling,
and likely auto-expansion with the changed LUN sizes
by fixing up the partition table along the way, since
it assumes being 100% in charge of the disk.

I don't think there is a "crime" in trying to use the
partitions (of either kind) as ZFS leaf vdevs, even the
zpool(1M) manpage states that:

... The  following  virtual  devices  are supported:
  disk
A block device, typically located under  /dev/dsk.
ZFS  can  use  individual  slices  or  partitions,
though the recommended mode of operation is to use
whole  disks.  ...

This is orthogonal to the fact that there can only be
one Solaris slice table, inside one partition, on MBR.
AFAIK this is irrelevant on GPT/EFI - no SMI slices there.

On my old home NAS with OpenSolaris I certainly did have
MBR partitions on the rpool intended initially for some
dual-booted OSes, but repurposed as L2ARC and ZIL devices
for the storage pool on other disks, when I played with
that technology. Didn't gain much with a single spindle ;)

HTH,
//Jim Klimov

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Andrew Gabriel

Andrew Werchowiecki wrote:


 Total disk size is 9345 cylinders
 Cylinder size is 12544 (512 byte) blocks
 
   Cylinders

  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 EFI   0  93459346100


You only have a p1 (and for a GPT/EFI labeled disk, you can only
have p1 - no other FDISK partitions are allowed).


partition> print
Current partition table (original):
Total disk sectors available: 117214957 + 16384 (reserved sectors)
 
Part  TagFlag First Sector Size Last Sector

  0usrwm642.00GB  4194367
  1usrwm   4194368   53.89GB  117214990
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 1172149918.00MB  117231374


You have an s0 and s1.

This isn’t the output from when I did it but it is exactly the same 
steps that I followed.
 
Thanks for the info about slices, I may give that a go later on. I’m not 
keen on that because I have clear evidence (as in zpools set up this 
way, right now, working, without issue) that GPT partitions of the style 
shown above work and I want to see why it doesn’t work in my set up 
rather than simply ignoring and moving on.


You would have to blow away the partitioning you have, and create an FDISK
partitioned disk (not EFI), and then create a p1 and p2 partition. (Don't
use the 'partition' subcommand, which confusingly creates solaris slices.)
Give the FDISK partitions a partition type which nothing will recognise,
such as 'other', so that nothing will try and interpret them as OS partitions.
Then you can use them as raw devices, and they should be portable between
OS's which can handle FDISK partitioned devices.

--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS

2013-03-19 Thread Deirdre Straughan
There are links to videos and other materials here:
http://wiki.smartos.org/display/DOC/ZFS

Not as organized as I'd like...


On Tue, Mar 19, 2013 at 2:30 AM, Hans J. Albertsson <
hans.j.alberts...@branneriet.se> wrote:

> as used on Illumos?
>
> I've seen a few tutorials written by people who obviously are very action
> oriented; afterwards you find you have worn your keyboard down a bit and
> not learned a lot at all, at least not in the sense of understanding what
> zfs is and what it does and why things are the way they are.
>
> I'm looking for something that would make me afterwards understand what,
> say, commands like  zpool import ... or zfs send ... actually do, and some
> idea as to why, so I can begin to understand ZFS in a way that allows me to
> make educated guesses on how to perform tasks I haven't tried before.
> And mostly without having to ask around for days on end.
>
> For SOME part of zfs I'm already there, but only for the things I had to
> do more than twice or so while managing the Swedish lab at Sun Micro.
>
>
> __**_
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/**mailman/listinfo/zfs-discuss
>



-- 


best regards,
Deirdré Straughan
Community Architect, SmartOS
illumos Community Manager


cell 720 371 4107
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS

2013-03-19 Thread Cindy Swearingen

Hi Hans,

Start with the ZFS Admin Guide, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/index.html

Or, start with your specific questions.

Thanks, Cindy

On 03/19/13 03:30, Hans J. Albertsson wrote:

as used on Illumos?

I've seen a few tutorials written by people who obviously are very
action oriented; afterwards you find you have worn your keyboard down a
bit and not learned a lot at all, at least not in the sense of
understanding what zfs is and what it does and why things are the way
they are.

I'm looking for something that would make me afterwards understand what,
say, commands like zpool import ... or zfs send ... actually do, and
some idea as to why, so I can begin to understand ZFS in a way that
allows me to make educated guesses on how to perform tasks I haven't
tried before.
And mostly without having to ask around for days on end.

For SOME part of zfs I'm already there, but only for the things I had to
do more than twice or so while managing the Swedish lab at Sun Micro.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Cindy Swearingen

Hi Andrew,

Your original syntax was incorrect.

A p* device is a larger container for the d* device or s* devices.
In the case of a cache device, you need to specify a d* or s* device.
That you can add p* devices to a pool is a bug.

Adding different slices from c25t10d1 as both log and cache devices
would need the s* identifier, but you've already added the entire
c25t10d1 as the log device. A better configuration would be using
c25t10d1 for log and using c25t9d1 for cache or provide some spares
for this large pool.

After you remove the log devices, re-add like this:

# zpool add aggr0 log c25t10d1
# zpool add aggr0 cache c25t9d1

You might review the ZFS recommendation practices section, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/zfspools-4.html#storage-2

See example 3-4 for adding a cache device, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/gayrd.html#gazgw

Always have good backups.

Thanks, Cindy



On 03/18/13 23:23, Andrew Werchowiecki wrote:

I did something like the following:

format -e /dev/rdsk/c5t0d0p0

fdisk

1 (create)

F (EFI)

6 (exit)

partition

label

1

y

0

usr

wm

64

4194367e

1

usr

wm

4194368

117214990

label

1

y

Total disk size is 9345 cylinders

Cylinder size is 12544 (512 byte) blocks

Cylinders

Partition Status Type Start End Length %

= ==  = === == ===

1 EFI 0 9345 9346 100

partition> print

Current partition table (original):

Total disk sectors available: 117214957 + 16384 (reserved sectors)

Part Tag Flag First Sector Size Last Sector

0 usr wm 64 2.00GB 4194367

1 usr wm 4194368 53.89GB 117214990

2 unassigned wm 0 0 0

3 unassigned wm 0 0 0

4 unassigned wm 0 0 0

5 unassigned wm 0 0 0

6 unassigned wm 0 0 0

8 reserved wm 117214991 8.00MB 117231374

This isn’t the output from when I did it but it is exactly the same
steps that I followed.

Thanks for the info about slices, I may give that a go later on. I’m not
keen on that because I have clear evidence (as in zpools set up this
way, right now, working, without issue) that GPT partitions of the style
shown above work and I want to see why it doesn’t work in my set up
rather than simply ignoring and moving on.

*From:*Fajar A. Nugraha [mailto:w...@fajar.net]
*Sent:* Sunday, 17 March 2013 3:04 PM
*To:* Andrew Werchowiecki
*Cc:* zfs-discuss@opensolaris.org
*Subject:* Re: [zfs-discuss] partioned cache devices

On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki
mailto:andrew.werchowie...@xpanse.com.au>> wrote:

I understand that p0 refers to the whole disk... in the logs I
pasted in I'm not attempting to mount p0. I'm trying to work out why
I'm getting an error attempting to mount p2, after p1 has
successfully mounted. Further, this has been done before on other
systems in the same hardware configuration in the exact same
fashion, and I've gone over the steps trying to make sure I haven't
missed something but can't see a fault.

How did you create the partition? Are those marked as solaris partition,
or something else (e.g. fdisk on linux use type "83" by default).

I'm not keen on using Solaris slices because I don't have an
understanding of what that does to the pool's OS interoperability.

Linux can read solaris slice and import solaris-made pools just fine, as
long as you're using compatible zpool version (e.g. zpool version 28).

--

Fajar



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS

2013-03-19 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Hans J. Albertsson
> 
> I'm looking for something that would make me afterwards understand what,
> say, commands like  zpool import ... or zfs send ... actually do, and
> some idea as to why, so I can begin to understand ZFS in a way that
> allows me to make educated guesses on how to perform tasks I haven't
> tried before.

man zpool
man zfs
And the ZFS Best Practices Guide
And the ZFS Evil (I forget what it's called, performance tuning? just search 
for evil, you'll find it.)

But almost everything is literally in the man pages.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What would be the best tutorial cum reference doc for ZFS

2013-03-19 Thread Hans J. Albertsson

as used on Illumos?

I've seen a few tutorials written by people who obviously are very 
action oriented; afterwards you find you have worn your keyboard down a 
bit and not learned a lot at all, at least not in the sense of 
understanding what zfs is and what it does and why things are the way 
they are.


I'm looking for something that would make me afterwards understand what, 
say, commands like  zpool import ... or zfs send ... actually do, and 
some idea as to why, so I can begin to understand ZFS in a way that 
allows me to make educated guesses on how to perform tasks I haven't 
tried before.

And mostly without having to ask around for days on end.

For SOME part of zfs I'm already there, but only for the things I had to 
do more than twice or so while managing the Swedish lab at Sun Micro.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss