On 03/09/2016 13:04, Alexander Motin wrote:
> Author: mav
> Date: Sat Sep  3 10:04:37 2016
> New Revision: 305331
> URL: https://svnweb.freebsd.org/changeset/base/305331
> 
> Log:
>   MFV r304155: 7090 zfs should improve allocation order and throttle 
> allocations
>   
>   illumos/illumos-gate@0f7643c7376dd69a08acbfc9d1d7d548b10c846a
>   
> https://github.com/illumos/illumos-gate/commit/0f7643c7376dd69a08acbfc9d1d7d548b
>   10c846a
>   
>   https://www.illumos.org/issues/7090
>     When write I/Os are issued, they are issued in block order but the ZIO 
> pipelin
>   e
>     will drive them asynchronously through the allocation stage which can 
> result i
>   n
>     blocks being allocated out-of-order. It would be nice to preserve as much 
> of
>     the logical order as possible.
>     In addition, the allocations are equally scattered across all top-level 
> VDEVs
>     but not all top-level VDEVs are created equally. The pipeline should be 
> able t
>   o
>     detect devices that are more capable of handling allocations and should
>     allocate more blocks to those devices. This allows for dynamic allocation
>     distribution when devices are imbalanced as fuller devices will tend to be
>     slower than empty devices.
>     The change includes a new pool-wide allocation queue which would throttle 
> and
>     order allocations in the ZIO pipeline. The queue would be ordered by 
> issued
>     time and offset and would provide an initial amount of allocation of work 
> to
>     each top-level vdev. The allocation logic utilizes a reservation system to
>     reserve allocations that will be performed by the allocator. Once an 
> allocatio
>   n
>     is successfully completed it's scheduled on a given top-level vdev. Each 
> top-
>     level vdev maintains a maximum number of allocations that it can handle
>     (mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels *
>     mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab
>     groups and round robin across all eligible metaslab groups to distribute 
> the
>     work. As top-levels complete their work, they receive additional work 
> from the
>     pool-wide allocation queue until the allocation queue is emptied.
>   
>   Reviewed by: Adam Leventhal <a...@delphix.com>
>   Reviewed by: Alex Reece <a...@delphix.com>
>   Reviewed by: Christopher Siden <christopher.si...@delphix.com>
>   Reviewed by: Dan Kimmel <dan.kim...@delphix.com>
>   Reviewed by: Matthew Ahrens <mahr...@delphix.com>
>   Reviewed by: Paul Dagnelie <paul.dagne...@delphix.com>
>   Reviewed by: Prakash Surya <prakash.su...@delphix.com>
>   Reviewed by: Sebastien Roy <sebastien....@delphix.com>
>   Approved by: Robert Mustacchi <r...@joyent.com>
>   Author: George Wilson <george.wil...@delphix.com>
> 
> Modified:
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/metaslab.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/refcount.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab.h
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/metaslab_impl.h
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/refcount.h
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/spa_impl.h
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/vdev_impl.h
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio_impl.h
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_cache.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_queue.c
>   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
>   head/sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h
> Directory Properties:
>   head/sys/cddl/contrib/opensolaris/   (props changed)
...
> @@ -3720,6 +3719,7 @@ spa_create(const char *pool, nvlist_t *n
>       spa->spa_uberblock.ub_txg = txg - 1;
>       spa->spa_uberblock.ub_version = version;
>       spa->spa_ubsync = spa->spa_uberblock;
> +     spa->spa_load_state = SPA_LOAD_CREATE;
>  
>       /*
>        * Create "The Godfather" zio to hold all async IOs
> @@ -3905,6 +3905,7 @@ spa_create(const char *pool, nvlist_t *n
>        */
>       spa_evicting_os_wait(spa);
>       spa->spa_minref = refcount_count(&spa->spa_refcount);
> +     spa->spa_load_state = SPA_LOAD_NONE;
>  
>       mutex_exit(&spa_namespace_lock);
>  
> @@ -5615,7 +5616,7 @@ spa_nvlist_lookup_by_guid(nvlist_t **nvp
...

Alexander,

I belive that this commit accidentally breaks the following scenario:
zpool create tank /dev/xxx
zpool destroy tank
zpool create tank /dev/xxx

It seems that vdev_geom code is unaware of SPA_LOAD_CREATE state and it would
try to match a device GUID, if it can be read, in addition to a name.

-- 
Andriy Gapon
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to