writes).
So how about also introducing a F2FS_FEATURE_HASMR feature flag to
handle these different cases ?
Also, I think that the DISCARD option must be enabled by default for
HMSMR disks. Otherwise, zones write pointer will never get reset.
The same applies to HASMR devices mounted with the LF
ormation on their zone
configuration can be obtained, lets treat those as regular drives.
2) Add ioctls for zone management:
Report zones (get information from RB tree), reset zone (simple wrapper
to ioctl for block discard), open zone, close zone and finish zone. That
will allow mkfs like tools t
Shaun,
On 8/10/16 12:58, Shaun Tancheff wrote:
On Tue, Aug 9, 2016 at 3:09 AM, Damien Le Moal <damien.lem...@hgst.com> wrote:
On Aug 9, 2016, at 15:47, Hannes Reinecke <h...@suse.de> wrote:
[trim]
Since disk type == 0 for everything that isn't HM so I would prefer the
sysfs
tell the difference with direct-to-drive SG_IO accesses. But unlike these, the
zone
ioctls keep the zone information RB-tree cache up to date.
>
> I will be updating my patchset accordingly.
I need to cleanup my code and rebase on top of 4.8-rc1. Let me do this and I
will send
everything
ng the length field makes the code
generic and following the standard, which has no restriction on the
zone sizes. We could do some memory optimisation using different types
of blk_zone sturcts, the types mapping to the SAME value: drives with
constant zone size can use a blk_zone type without the le
Shaun,
On 8/23/16 09:22, Shaun Tancheff wrote:
> On Mon, Aug 22, 2016 at 6:57 PM, Damien Le Moal <damien.lem...@hgst.com>
> wrote:
>>
>> Shaun,
>>
>> On 8/22/16 13:31, Shaun Tancheff wrote:
>> [...]
>>> -int sd_zbc_setup
FFLINE) {
> - /* let the drive fail the command */
> - sd_zbc_debug_ratelimit(sdkp,
> -"zone %zu offline\n",
> -zone->start);
> - goto out;
> - }
> -
> - if (b
w what you think. If we drop this, we can get a clean
and full ZBC support patch set ready in no time at all.
Best regards.
--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
damien.lem...@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 ki
}
>
> + *nr_zones = nz;
> out:
> bio_for_each_segment_all(bv, bio, i)
> __free_page(bv->bv_page);
> bio_put(bio);
>
> - if (ret == 0)
> - *nr_zones = nz;
> -
> return ret;
> }
> EXPORT_SYMBOL_GPL(blkdev_r
+ zone_blocks = 0;
> goto out;
> + }
>
> same = buf[4] & 0x0f;
> if (same > 0) {
Reviewed-by: Damien Le Moal <damien.lem...@wdc.com>
--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital C
ne_size(bdev), );
> + if (rem || nr_sects != bdev_zone_size(bdev)) {
> f2fs_msg(sbi->sb, KERN_INFO,
> "(%d) %s: Unaligned discard attempted (block %x + %x)",
> devi, sbi->s_ndevs ? FDEV(devi).path: "",
Tha
can use:
if (p.start & (bdev_physical_block_size(bdev) - 1))
Or use div_u64_rem to avoid an error on 32 bits builds.
Best regards.
--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
damien.lem...@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
tor size in sd.c) is a
power of 2 between 512 and 4096, and the physical block size is a power
of 2 number of logical blocks. So the physical block size is also always
a power of 2.
>
> Or use div_u64_rem to avoid an error on 32 bits builds.
>
> Best regards.
>
--
Damien Le
e(bdev) - 1))
return -EINVAL;
I am not sure however how bdev_is_dasd can be implemented though.
Best regards.
--
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
damien.lem...@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com
On 12/15/16 17:45, Christoph Hellwig wrote:
> On Thu, Dec 15, 2016 at 10:33:47AM +0900, Damien Le Moal wrote:
>> For a regular block device, I agree. But in Stephan case, I think that
>> the check really needs to be against the physical block size, with the
>> added co
ive-managed zoned block device target")
>
> I have used the device-mapper tree from next-20170608 for today.
My apologies for that. My mistake.
I just posted a patch to dm-devel to fix this.
Everything should come in order after Mike's review.
Best regards.
--
Damien Le Moal
works.
Best regards.
--
Damien Le Moal,
Western Digital
Philipp,
On 9/14/17 21:51, Philipp wrote:
> Dear Damien,
>
> Thank you for your feedback.
>
>> On 14. Sep 2017, at 10:46, Damien Le Moal <damien.lem...@wdc.com>
>> wrote:
>
> […]
>
>> Writing once to a sector stored on spinning rust will
re progressing, we hopefully should be able to get
Bart's pacth back in soon (this week ?) and remove the mq dispatch deadlock.
Best regards.
--
Damien Le Moal,
Western Digital Research
re progressing, we hopefully should be able to get
Bart's pacth back in soon (this week ?) and remove the mq dispatch deadlock.
Best regards.
--
Damien Le Moal,
Western Digital Research
works.
Best regards.
--
Damien Le Moal,
Western Digital
Philipp,
On 9/14/17 21:51, Philipp wrote:
> Dear Damien,
>
> Thank you for your feedback.
>
>> On 14. Sep 2017, at 10:46, Damien Le Moal
>> wrote:
>
> […]
>
>> Writing once to a sector stored on spinning rust will *not* fully
>> erase the previ
>
> atomic_t__bi_cnt; /* pin count */
This modification comes at the cost of increasing the bio structure size to
simply tell the block layer "do not delay BIO splitting"...
I think there is a much simpler approach. What about:
1) Use a request queue flag to indicate "limit BIO size"
2) modify __bio_try_merge_page() to look at that flag to disallow page merging
if the bio size exceeds blk_queue_get_max_sectors(), or more ideally a version
of it that takes into account the bio start sector.
3) Set the "limit bio size" queue flag in the driver of the device that benefit
from this change. Eventually, that could also be controlled through sysfs.
With such change, you will get the same result without having to increase the
BIO structure size.
--
Damien Le Moal
Western Digital Research
QUEUE_FLAG_LIMIT_MERGE, &(q)->queue_flags)
static inline unsigned int bio_max_merge_size(struct bio *bio)
{
struct request_queue *q = bio->bi_disk->queue;
if (blk_queue_limit_bio_merge_size(q))
return blk_queue_get_max_sectors(q, bio_op(bio))
<< SECTOR_SHIFT;
return UINT_MAX;
}
and use that helper in __bio_try_merge_page(), e.g.:
if (bio->bi_iter.bi_size > bio_max_merge_size(bio) - len) {
*same_page = false;
return false;
}
No need to change the bio struct.
If you measure performance with and without this change on nullblk, you can
verify if it has any impact for regular devices. And for your use case, that
should give you the same performance.
>
> bool __bio_try_merge_page(struct bio *bio, struct page *page,
> unsigned int len, unsigned int off, bool *same_page)
> {
> ...
> if (page_is_mergeable(bv, page, len, off, same_page)) {
> - if (bio->bi_iter.bi_size > UINT_MAX - len) {
> + if (bio->bi_iter.bi_size > bio->bi_max_size - len) {
> *same_page = false;
> return false;
> }
>
> bv->bv_len += len;
> bio->bi_iter.bi_size += len;
> return true;
> }
> ...
> }
>
>
> static inline bool bio_full(struct bio *bio, unsigned len)
> {
> ...
> - if (bio->bi_iter.bi_size > UINT_MAX - len)
> + if (bio->bi_iter.bi_size > bio->bi_max_size - len)
> return true;
> ...
> }
>
> +void bio_set_dev(struct bio *bio, struct block_device *bdev)
> +{
> + if (bio->bi_disk != bdev->bd_disk)
> + bio_clear_flag(bio, BIO_THROTTLED);
> +
> + bio->bi_disk = bdev->bd_disk;
> + bio->bi_partno = bdev->bd_partno;
> + if (blk_queue_limit_bio_max_size(bio))
> + bio->bi_max_size = blk_queue_get_bio_max_size(bio);
> +
> + bio_associate_blkg(bio);
> +}
> +EXPORT_SYMBOL(bio_set_dev);
>
> > --
> > Damien Le Moal
> > Western Digital Research
>
> ---
> Changheun Lee
> Samsung Electronics
--
Damien Le Moal
Western Digital
he block layer to limit command size if the value of
>> opt_xfer_blocks is smaller than the limit initially set with max_xfer_blocks.
>>
>> So if for your device max_sectors end up being too small, it is likely
>> because
>> the device itself is reporting an opt_xfer_bl
er_blocks.
So if for your device max_sectors end up being too small, it is likely because
the device itself is reporting an opt_xfer_blocks value that is too small for
its own good. The max_sectors limit can be manually increased with "echo xxx >
/sys/block/sdX/queue/max_sectors_kb". A udev rule can be used to handle this
autmatically if needed.
But to get a saner default for that device, I do not think that this patch is
the right solution. Ideally, the device peculiarity should be handled with a
quirk, but that is not used in scsi. So beside the udev rule trick, I am not
sure what the right approach is here.
>> q->limits.io_opt = logical_to_bytes(sdp, sdkp->opt_xfer_blocks);
>> rw_max = logical_to_sectors(sdp, sdkp->opt_xfer_blocks);
>> } else {
>> --
>> 2.29.0
>>
>>
>
--
Damien Le Moal
Western Digital Research
; +++ b/include/linux/bio.h
>>> @@ -20,6 +20,7 @@
>>> #endif
>>>
>>> #define BIO_MAX_PAGES 256
>>> +#define BIO_MAX_SIZE (BIO_MAX_PAGES * PAGE_SIZE)
>>>
>>> #define bio_prio(bio) (bio)
- len) {
>>>>> + if (bio->bi_iter.bi_size > BIO_MAX_SIZE - len) {
>>>>> *same_page = false;
>>>>> return false;
>>>>> }
>>>>> diff --gi
t;>>> include/linux/bio.h | 3 ++-
>>>>>>> 2 files changed, 3 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/bio.c b/block/bio.c
>>>>>>> index 1f2cc1fbe283..dbe14d675f28 100644
>>>
On 2021/01/14 12:53, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 12:02:44PM +0000, Damien Le Moal wrote:
>> On 2021/01/13 20:48, Ming Lei wrote:
>>> On Wed, Jan 13, 2021 at 11:16:11AM +0000, Damien Le Moal wrote:
>>>> On 2021/01/13 19:25, Ming Lei wrote:
>>&g
On 2020/07/31 21:51, h...@infradead.org wrote:
> On Fri, Jul 31, 2020 at 10:16:49AM +0000, Damien Le Moal wrote:
>>>
>>> Let's keep semantics and implementation separate. For the case
>>> where we report the actual offset we need a size imitation and no
>>&
register to 0, indicating to exception vector
> + * that we are presently executing in the kernel
> + */
> + csr_write(CSR_SCRATCH, 0);
> + /* Set the exception vector address */
> + csr_write(CSR_TVEC, _exception);
> +#endif
> }
>
Looks OK to me. But out of curiosity, how did you trigger a problem ? I never
got any weird exceptions with my busybox userspace.
--
Damien Le Moal
Western Digital Research
#ifndef CONFIG_MMU
> + /*
> + * Set sup0 scratch register to 0, indicating to exception vector
> + * that we are presently executing in the kernel
> + */
> + csr_write(CSR_SCRATCH, 0);
> + /* Set the exception vector address */
> + csr_write(CSR_TVEC, _exception);
> +#endif
> }
>
--
Damien Le Moal
Western Digital Research
046511054ns
> [0.008254] Console: colour dummy device 80x25
Interesting. Never saw that happening... Thanks !
>
>
>
> -原始邮件-
> 发件人: "Damien Le Moal"
> 发送时间: 2020-08-11 14:42:15 (星期二)
> 收件人: "Qiu Wenbo" , "Palmer Dabbelt"
> ,
On 2020/10/10 4:52, ira.we...@intel.com wrote:
> From: Ira Weiny
>
> The kmap() calls in this FS are localized to a single thread. To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
>
> Cc: Damien Le Moal
> Cc: Naohiro Aota
> Signed-off-by
ge open zone resources function from ZBC,
>> but additionally add support for max active zones.
>> This enables user space not only to test against a device with an open
>> zone limit, but also to test against a device with an active zone limit.
>>
>> Signe
ort for max active zones.
> This enables user space not only to test against a device with an open
> zone limit, but also to test against a device with an active zone limit.
>
> Signed-off-by: Niklas Cassel
> ---
> Changes since v1:
> -Fixed review comments by Damie
| 288
> block/genhd.c | 24 +++
> include/linux/blk-filter.h | 41 +
> include/linux/genhd.h | 2 +
> 8 files changed, 410 insertions(+), 2 deletions(-)
> create mode 100644 block/blk-filter-internal.h
> create mode 100644 block/blk-filter.c
> create mode 100644 include/linux/blk-filter.h
>
--
Damien Le Moal
Western Digital Research
On 2020/08/28 16:23, Klaus Jensen wrote:
> On Aug 28 07:06, Damien Le Moal wrote:
>> On 2020/08/27 22:50, Niklas Cassel wrote:
>>> +static blk_status_t null_finish_zone(struct nullb_device *dev, struct
>>> blk_zone *zone)
>>> +{
>>> +
On 2020/08/28 16:47, Klaus Jensen wrote:
> On Aug 28 07:36, Damien Le Moal wrote:
>> On 2020/08/28 16:23, Klaus Jensen wrote:
>>> On Aug 28 07:06, Damien Le Moal wrote:
>>>> On 2020/08/27 22:50, Niklas Cassel wrote:
>>>>> +static blk_status_t null
_bss_done:
> call relocate
> #endif /* CONFIG_MMU */
>
> + call setup_trap_vector
> /* Restore C environment */
> la tp, init_task
> sw zero, TASK_TI_CPU(tp)
>
--
Damien Le Moal
Western Digital Research
On 2020/08/13 15:45, Atish Patra wrote:
> On Wed, Aug 12, 2020 at 10:44 PM Damien Le Moal wrote:
>>
>> On 2020/08/13 12:40, Qiu Wenbo wrote:
>>> Exception vector is missing on nommu platform and that is an issue.
>>> This patch is tested in Sipeed Maix Bit Dev B
On 2020/08/14 17:14, h...@infradead.org wrote:
> On Wed, Aug 05, 2020 at 07:35:28AM +0000, Damien Le Moal wrote:
>>> the write pointer. The only interesting addition is that we also want
>>> to report where we wrote. So I'd rather have RWF_REPORT_OFFSET or so.
&g
+4482,6 @@ static int resp_open_zone(struct scsi_cmnd *scp, struct
> sdebug_dev_info *devip)
> goto fini;
> }
>
> - if (zc == ZC2_IMPLICIT_OPEN)
> - zbc_close_zone(devip, zsp);
> zbc_open_zone(devip, zsp, true);
> fini:
> write_unlock(
se anymore), so our test setup is a bit lame in this area. We'll rig
something up with tcmu-runner emulation to add tests for these devices to avoid
a repeat of such problem. And we'll make sure to add a test for
host-aware+partitions, since we at least know for sure there is one user :)
Johannes: The "goto out_free_index;" on sd_zbc_init_disk() failure is wrong I
think: the disk is already added and a ref taken on the dev, but out_free_index
does not seem to do cleanup for that. Need to revisit this.
Cheers.
--
Damien Le Moal
Western Digital Research
On 2020/09/12 17:37, Borislav Petkov wrote:
> Hi Damien,
>
> On Sat, Sep 12, 2020 at 02:31:55AM +0000, Damien Le Moal wrote:
>> Can you try this:
>
> sure, but it is white-space damaged:
>
> checking file drivers/scsi/sd.c
> patch: malformed patch at line 86:
On 2020/09/12 18:09, Johannes Thumshirn wrote:
> On 12/09/2020 04:31, Damien Le Moal wrote:
>> On 2020/09/12 8:07, Borislav Petkov wrote:
>>> On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote:
>>>> Enabling it, fixes the issue.
>>>
>>>
ide zonefs_io_error variant that can be called
> with i_truncate_mutex held")
> 16ef4f7638ac ("zonefs: introduce helper for zone management")
>
> are missing a Signed-off-by from their committer.
>
Fixed. Sorry about that !
--
Damien Le Moal
Western Digital Research
Integrating nvme simple copy in such initial support would I think be quite
simple and scsi xcopy can follow. From there, adding stack device support can be
worked on with little, if any, impact on the existing users of the block copy
API (mostly FSes such as f2fs and btrfs).
--
Damien Le Moal
Western Digital Research
> extern int blk_rq_map_user_iov(struct request_queue *, struct request *,
> struct rq_map_data *, const struct iov_iter *,
>
--
Damien Le Moal
Western Digital Research
opy */
I think this should be called QUEUE_FLAG_SIMPLE_COPY to indicate more precisely
the type of copy supported. SCSI XCOPY is more advanced...
>
> #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |
> \
>(1 << QUEUE_FLAG_SAME_COMP) | \
> @@ -647,6 +652,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag,
> struct request_queue *q);
> #define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
> #define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM,
> &(q)->queue_flags)
> #define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags)
> +#define blk_queue_copy(q)test_bit(QUEUE_FLAG_COPY, &(q)->queue_flags)
> #define blk_queue_zone_resetall(q) \
> test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
> #define blk_queue_secure_erase(q) \
> @@ -1061,6 +1067,9 @@ static inline unsigned int
> blk_queue_get_max_sectors(struct request_queue *q,
> return min(q->limits.max_discard_sectors,
> UINT_MAX >> SECTOR_SHIFT);
>
> + if (unlikely(op == REQ_OP_COPY))
> + return q->limits.max_copy_sectors;
> +
> if (unlikely(op == REQ_OP_WRITE_SAME))
> return q->limits.max_write_same_sectors;
>
> @@ -1335,6 +1344,10 @@ extern int __blkdev_issue_discard(struct block_device
> *bdev, sector_t sector,
> sector_t nr_sects, gfp_t gfp_mask, int flags,
> struct bio **biop);
>
> +extern int blkdev_issue_copy(struct block_device *bdev, int nr_srcs,
> + struct range_entry *src_rlist, struct block_device *dest_bdev,
> + sector_t dest, gfp_t gfp_mask);
> +
> #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */
> #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit
> zeroes */
>
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index f44eb0a04afd..5cadb176317a 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -64,6 +64,18 @@ struct fstrim_range {
> __u64 minlen;
> };
>
> +struct range_entry {
> + __u64 src;
> + __u64 len;
> +};
> +
> +struct copy_range {
> + __u64 dest;
> + __u64 nr_range;
> + __u64 range_list;
> + __u64 rsvd;
> +};
> +
> /* extent-same (dedupe) ioctls; these MUST match the btrfs ioctl definitions
> */
> #define FILE_DEDUPE_RANGE_SAME 0
> #define FILE_DEDUPE_RANGE_DIFFERS1
> @@ -184,6 +196,7 @@ struct fsxattr {
> #define BLKSECDISCARD _IO(0x12,125)
> #define BLKROTATIONAL _IO(0x12,126)
> #define BLKZEROOUT _IO(0x12,127)
> +#define BLKCOPY _IOWR(0x12, 128, struct copy_range)
> /*
> * A jump here: 130-131 are reserved for zoned block devices
> * (see uapi/linux/blkzoned.h)
>
--
Damien Le Moal
Western Digital Research
On 2021/01/05 21:24, Selva Jove wrote:
> Thanks for the review, Damien.
>
> On Mon, Jan 4, 2021 at 6:17 PM Damien Le Moal wrote:
>>
>> On 2021/01/04 19:48, SelvaKumar S wrote:
>>> Add new BLKCOPY ioctl that offloads copying of one or more sources
>>>
skd_pci_info(skdev, pci_str);
> - dev_info(>dev, "%s 64bit\n", pci_str);
Replace these 2 lines with:
pcie_print_link_status(pdev);
And the link speed information will be printed.
--
Damien Le Moal
Western Digital Research
ev_info(>dev, "%s 64bit\n", pci_str);
> + pcie_print_link_status(pdev);
>
> pci_set_master(pdev);
> rc = pci_enable_pcie_error_reporting(pdev);
>
Note: V1 of this patch was the one I commented on. This one should thus be V2.
In any case, this looks OK to me.
Acked-by: Damien Le Moal
--
Damien Le Moal
Western Digital Research
tristate "Drive-managed zoned block device target support"
> depends on BLK_DEV_DM
> depends on BLK_DEV_ZONED
> + select CRC32
> help
> This device-mapper target takes a host-managed or host-aware zoned
> block device and exposes mo
nfig
> @@ -3,6 +3,7 @@ config ZONEFS_FS
> depends on BLOCK
> depends on BLK_DEV_ZONED
> select FS_IOMAP
> + select CRC32
> help
> zonefs is a simple file system which exposes zones of a zoned block
> device (e.g. host-managed or host-aware S
s especially actual if there is no file system on the disk.
> + */
> +
> +void blk_filter_freeze(struct block_device *bdev);
> +
> +void blk_filter_thaw(struct block_device *bdev);
> +
> +/*
> + * Filters intercept function
> + */
> +void blk_filter_submit_bio(struct bio *bio);
> +
> +#endif /* CONFIG_BLK_FILTER */
> +
> +#endif
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index 4ab853461dff..514fab6b947e 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -4,7 +4,7 @@
>
> /*
> * genhd.h Copyright (C) 1992 Drew Eckhardt
> - * Generic hard disk header file by
> + * Generic hard disk header file by
> * Drew Eckhardt
> *
> *
> @@ -75,6 +75,12 @@ struct hd_struct {
> int make_it_fail;
> #endif
> struct rcu_work rcu_work;
> +
> +#ifdef CONFIG_BLK_FILTER
> + struct rw_semaphore filter_rw_lockup; /* for freezing block device*/
> + struct blk_filter *filter; /* block layer filter*/
> + void *filter_data; /*specific for each block device filters data*/
> +#endif
> };
>
> /**
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index 01e2858b5fe3..5287346b87a1 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -283,7 +283,7 @@ static int hib_submit_io(int op, int op_flags, pgoff_t
> page_off, void *addr,
> bio->bi_end_io = hib_end_io;
> bio->bi_private = hb;
> atomic_inc(>count);
> - submit_bio(bio);
> + submit_bio_direct(bio);
> } else {
> error = submit_bio_wait(bio);
> bio_put(bio);
> diff --git a/mm/page_io.c b/mm/page_io.c
> index e485a6e8a6cd..4540426400b3 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -362,7 +362,7 @@ int __swap_writepage(struct page *page, struct
> writeback_control *wbc,
> count_swpout_vm_event(page);
> set_page_writeback(page);
> unlock_page(page);
> - submit_bio(bio);
> + submit_bio_direct(bio);
> out:
> return ret;
> }
> @@ -434,7 +434,7 @@ int swap_readpage(struct page *page, bool synchronous)
> }
> count_vm_event(PSWPIN);
> bio_get(bio);
> - qc = submit_bio(bio);
> + qc = submit_bio_direct(bio);
> while (synchronous) {
> set_current_state(TASK_UNINTERRUPTIBLE);
> if (!READ_ONCE(bio->bi_private))
>
Separate into multiple patches: one that introduces the filter functions/ops
code and another that changes the block layer where needed.
--
Damien Le Moal
Western Digital Research
of how
things work (driver/md/dm-linear.c). More complex dm drivers like dm-crypt,
dm-writecache or dm-thin can give you hints about more features of device
mapper.
Functions such as __map_bio() in drivers/md/dm.c are the core of DM and show
what happens to BIOs depending on the the return value of the map
t; drivers/nvme/host/core.c | 87 +++
> include/linux/bio.h | 1 +
> include/linux/blk_types.h | 15 +
> include/linux/blkdev.h| 15 +
> include/linux/nvme.h | 43 -
> include/uapi/linux/fs.h | 13
> 14 files changed, 461 insertions(+), 11 deletions(-)
>
--
Damien Le Moal
Western Digital Research
set
> # CONFIG_SERIO is not set
> +# CONFIG_VT is not set
> # CONFIG_LEGACY_PTYS is not set
> # CONFIG_LDISC_AUTOLOAD is not set
> # CONFIG_HW_RANDOM is not set
> @@ -60,7 +61,6 @@ CONFIG_GPIO_SIFIVE=y
> CONFIG_POWER_RESET=y
> CONFIG_POWER_RESET_SYSCON=y
> # CONFIG_HWMON is n
irmware/efi/Kconfig
> @@ -270,7 +270,7 @@ config EFI_DEV_PATH_PARSER
>
>
> config EFI_EARLYCON
> def_bool y
> - depends on SERIAL_EARLYCON && !ARM && !IA64
> + depends on EFI && SERIAL_EARLYCON && !ARM && !IA64
> select FONT_SUPPORT
> select ARCH_USE_MEMREMAP_PROT
>
>
Looks good to me.
Reviewed-by: Damien Le Moal
--
Damien Le Moal
Western Digital
On Wed, 2020-11-25 at 09:20 +0100, Geert Uytterhoeven wrote:
> Hi Damien,
>
> On Wed, Nov 25, 2020 at 7:14 AM Damien Le Moal wrote:
> > On 2020/11/25 3:57, Geert Uytterhoeven wrote:
> > > There is no need to enable Virtual Terminal support in the Canaan
> &
On 2020/11/25 17:51, Geert Uytterhoeven wrote:
> Hi Damien,
>
> On Wed, Nov 25, 2020 at 7:14 AM Damien Le Moal wrote:
>> On 2020/11/25 3:57, Geert Uytterhoeven wrote:
>>> There is no need to enable Virtual Terminal support in the Canaan
>>> Kendryte K210 d
On 2020/11/25 18:26, Geert Uytterhoeven wrote:
> Hi Damien,
>
> On Wed, Nov 25, 2020 at 10:02 AM Damien Le Moal wrote:
>> On 2020/11/25 17:51, Geert Uytterhoeven wrote:
>>> On Wed, Nov 25, 2020 at 7:14 AM Damien Le Moal
>>> wrote:
>>>>
On 2020/11/25 20:00, Damien Le Moal wrote:
> On 2020/11/25 18:26, Geert Uytterhoeven wrote:
>> Hi Damien,
>>
>> On Wed, Nov 25, 2020 at 10:02 AM Damien Le Moal
>> wrote:
>>> On 2020/11/25 17:51, Geert Uytterhoeven wrote:
>>>> On Wed, Nov
On Wed, 2020-11-25 at 13:47 +0100, Geert Uytterhoeven wrote:
> Hi Damien,
>
> On Wed, Nov 25, 2020 at 12:00 PM Damien Le Moal wrote:
> > On 2020/11/25 18:26, Geert Uytterhoeven wrote:
> > > On Wed, Nov 25, 2020 at 10:02 AM Damien Le Moal
> > > wrote:
&
Andreas,
On 2019/07/30 3:40, Andreas Dilger wrote:
> On Jul 26, 2019, at 8:59 PM, Damien Le Moal wrote:
>>
>> On 2019/07/27 7:55, Theodore Y. Ts'o wrote:
>>> On Sat, Jul 27, 2019 at 08:44:23AM +1000, Dave Chinner wrote:
>>>>>
>>>>> This look
Dave,
On 2019/07/31 8:48, Dave Chinner wrote:
> On Tue, Jul 30, 2019 at 02:06:33AM +0000, Damien Le Moal wrote:
>> If we had a pread_nofs()/pwrite_nofs(), that would work. Or we could define a
>> RWF_NORECLAIM flag for pwritev2()/preadv2(). This last one could actually be
>
is branch and all tests passed, no compilation problems
either. I will send a v2 of zonefs patch with all of Dave's comments addressed
shortly.
Thank you.
Best regards.
--
Damien Le Moal
Western Digital Research
gt;zones[i]);
> }
You can drop the curly brackets too, and start the loop from nr_conv_zones too.
> break;
> case REQ_OP_ZONE_RESET:
> - if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> - return BLK_STS_IOERR;
> -
> - zone->cond = BLK_ZONE_COND_EMPTY;
> - zone->wp = zone->start;
> + ret = null_reset_zone(dev, zone);
> + if (ret != BLK_STS_OK)
> + return ret;
You can return directly here:
return null_reset_zone(dev, zone);
> break;
> case REQ_OP_ZONE_OPEN:
> - if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> - return BLK_STS_IOERR;
> - if (zone->cond == BLK_ZONE_COND_FULL)
> - return BLK_STS_IOERR;
> -
> - zone->cond = BLK_ZONE_COND_EXP_OPEN;
> + ret = null_open_zone(dev, zone);
> + if (ret != BLK_STS_OK)
> + return ret;
same here.
> break;
> case REQ_OP_ZONE_CLOSE:
> - if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> - return BLK_STS_IOERR;
> - if (zone->cond == BLK_ZONE_COND_FULL)
> - return BLK_STS_IOERR;
> -
> - if (zone->wp == zone->start)
> - zone->cond = BLK_ZONE_COND_EMPTY;
> - else
> - zone->cond = BLK_ZONE_COND_CLOSED;
> + ret = null_close_zone(dev, zone);
> + if (ret != BLK_STS_OK)
> + return ret;
And here.
> break;
> case REQ_OP_ZONE_FINISH:
> - if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> - return BLK_STS_IOERR;
> -
> - zone->cond = BLK_ZONE_COND_FULL;
> - zone->wp = zone->start + zone->len;
> + ret = null_finish_zone(dev, zone);
> + if (ret != BLK_STS_OK)
> + return ret;
And here too.
> break;
> default:
> return BLK_STS_NOTSUPP;
>
--
Damien Le Moal
Western Digital Research
On 2020/08/26 7:52, Damien Le Moal wrote:
> On 2020/08/25 21:22, Niklas Cassel wrote:
>> Add support for user space to set a max open zone and a max active zone
>> limit via configfs. By default, the default value is 0 == no limit.
>
> s/value is/values are/
>
>&g
rformance impact over regular
writes *and* zone write locking does not in general degrade HDD write
performance (only a few corner cases suffer from it). Comparing things equally,
the same could be said of NVMe drives that do not have zone append native
support: performance will be essentially the same using regular writes and
emulated zone append. But mq-deadline and zone write locking will significantly
lower performance for emulated zone append compared to a native zone append
support by the drive.
--
Damien Le Moal
Western Digital Research
rite lock is taken and
released by the emulation driver itself, ELEVATOR_F_ZBD_SEQ_WRITE is required
only if the user will also be issuing regular writes at high QD. And that is
trivially controllable by the user by simply setting the drive elevator to
mq-deadline. Conclusion: setting ELEVATOR_F_Z
On 2020/08/19 19:32, Kanchan Joshi wrote:
> On Wed, Aug 19, 2020 at 3:08 PM Damien Le Moal wrote:
>>
>> On 2020/08/19 18:27, Kanchan Joshi wrote:
>>> On Tue, Aug 18, 2020 at 12:46 PM Christoph Hellwig wrote:
>>>>
>>>> On Tue, Aug 18, 2020 at
On 2020/08/28 19:06, Niklas Cassel wrote:
> On Fri, Aug 28, 2020 at 07:06:26AM +0000, Damien Le Moal wrote:
>> On 2020/08/27 22:50, Niklas Cassel wrote:
>>> Add support for user space to set a max open zone and a max active zone
>>> limit via configfs. By defaul
problem. The spinlock serializes the execution of all commands. null_blk zone
append emulation thus does not need to take the scheduler level zone write lock
like scsi does.
--
Damien Le Moal
Western Digital Research
On 2020/09/07 20:24, Kanchan Joshi wrote:
> On Mon, Sep 7, 2020 at 1:52 PM Damien Le Moal wrote:
>>
>> On 2020/09/07 16:01, Kanchan Joshi wrote:
>>>> Even for SMR, the user is free to set the elevator to none, which disables
>>>> zone
>>>> w
On 2020/09/07 20:54, Kanchan Joshi wrote:
> On Mon, Sep 7, 2020 at 5:07 PM Damien Le Moal wrote:
>>
>> On 2020/09/07 20:24, Kanchan Joshi wrote:
>>> On Mon, Sep 7, 2020 at 1:52 PM Damien Le Moal wrote:
>>>>
>>>> On 2020/09/07 16:01, Kanchan Jo
On 2020/08/14 21:04, h...@infradead.org wrote:
> On Fri, Aug 14, 2020 at 08:27:13AM +0000, Damien Le Moal wrote:
>>>
>>> O_APPEND pretty much implies out of order, as there is no way for an
>>> application to know which thread wins the race to w
zone->cond = BLK_ZONE_COND_FULL;
> zone->wp = zone->start + zone->len;
> break;
> default:
> - return BLK_STS_NOTSUPP;
> + ret = BLK_STS_NOTSUPP;
> }
>
> + spin_unlock_irq(>zlock);
> trace_nullb_zone_op(cmd, zone_no, zone->cond);
> - return BLK_STS_OK;
> + return ret;
> }
I think you can avoid all of these changes by taking the lock around the calls
to null_zone_mgmt() and null_zone_write() in null_process_zoned_cmd(). That will
make the patch a lot smaller and simplify maintenance. And even, I think that
taking the lock on entry to null_process_zoned_cmd() and unlocking on return
should even be simpler since that would cover reads too (valid read len). Only
report zones would need special treatment.
>
> blk_status_t null_process_zoned_cmd(struct nullb_cmd *cmd, enum req_opf op,
>
I think we also need this to have a Cc: stable and a "Fixes" tag too.
--
Damien Le Moal
Western Digital Research
nefs, may be some use cases may suffer
from it, but my tests with LevelDB+zonefs did not show any significant
difference. zonefs open()/close() operations are way faster than for a regular
file system since there is no metadata and all inodes always exist in-memory.
And zonefs() now supports MAR/MOR limits for O_WRONLY open(). That can simplify
things for the user.
--
Damien Le Moal
Western Digital Research
return readl_relaxed(((u32 *)clint_time_val) + 1);
> + if (clint_time_val)
> + return readl_relaxed(((u32 *)clint_time_val) + 1);
> + return 0
> }
> #define get_cycles_hi get_cycles_hi
> #endif /* CONFIG_64BIT */
Applying this on top of rc6, I now get a hang on Kendryte boot...
No problems without the patch on the other hand.
--
Damien Le Moal
Western Digital
On Sat, 2020-09-26 at 09:31 +, Anup Patel wrote:
> > -Original Message-
> > From: Damien Le Moal
> > Sent: 26 September 2020 14:55
> > To: paul.walms...@sifive.com; pal...@dabbelt.com;
> > palmerdabb...@google.com; Anup Patel ;
> > a...@eecs.berk
On Sat, 2020-09-26 at 15:27 +0530, Anup Patel wrote:
> On Sat, Sep 26, 2020 at 3:16 PM Damien Le Moal wrote:
> > On Sat, 2020-09-26 at 09:31 +, Anup Patel wrote:
> > > > -Original Message-
> > > > From: Damien Le Moal
> > > > Sent: 2
On Sat, 2020-09-26 at 11:09 +0100, Maciej W. Rozycki wrote:
> On Sat, 26 Sep 2020, Damien Le Moal wrote:
>
> > > > Applying this on top of rc6, I now get a hang on Kendryte boot...
> > > > No problems without the patch on the other hand.
> > >
> > &
ers/clocksource/timer-clint.c
> +++ b/drivers/clocksource/timer-clint.c
> @@ -37,7 +37,7 @@ static unsigned long clint_timer_freq;
> static unsigned int clint_timer_irq;
>
> #ifdef CONFIG_RISCV_M_MODE
> -u64 __iomem *clint_time_val;
> +u64 __iomem *clint_time_val = NULL;
> #endif
>
> static void clint_send_ipi(const struct cpumask *target)
For Kendryte:
Tested-by: Damien Le Moal
--
Damien Le Moal
Western Digital
cess. Unfortunately we don't have a fallback, so instead
> > + * we just return 0.
> > + */
> > +static inline unsigned long random_get_entropy(void)
> > +{
> > + if (unlikely(clint_time_val == NULL))
> > + return 0;
> > + return get_cycles();
> > +}
> > +#define random_get_entropy() random_get_entropy()
> > +
> > #else /* CONFIG_RISCV_M_MODE */
> >
> > static inline cycles_t get_cycles(void)
>
> Reviewed-by: Palmer Dabbelt
>
> I'm going to wait for Damien to chime in about the NULL initialization boot
> failure, though, as I'm a bit worried something else was going on.
>
> Thanks!
For Kendryte, no problems. Boots correctly.
Tested-by: Damien Le Moal
--
Damien Le Moal
Western Digital
> +static inline unsigned long random_get_entropy(void)
> +{
> + if (unlikely(clint_time_val == NULL))
> + return 0;
> + return get_cycles();
> +}
> +#define random_get_entropy() random_get_entropy()
> +
> #else /* CONFIG_RISCV_M_MODE */
>
> static inline cycles_t get_cycles(void)
Did not reply to the patch... So again for Kendryte:
Tested-by: Damien Le Moal
--
Damien Le Moal
Western Digital
On 2020/09/21 23:41, Sasha Levin wrote:
> From: Damien Le Moal
>
> [ Upstream commit f025d9d9934b84cd03b7796072d10686029c408e ]
>
> The Kendryte K210 SoC CLINT is compatible with Sifive clint v0
> (sifive,clint0). Fix the Kendryte K210 device tree clint entry to be
> i
+++
> 2 files changed, 19 insertions(+), 4 deletions(-)
>
For single patches, you should add this after the "---" in the patch file, above
the patch stats. This is ignores by git when the patch is applied (the patch
starts at the first "diff" entry).
--
Damien Le Moal
Western Digital Research
_ZONE_OPEN:
> case REQ_OP_ZONE_CLOSE:
> case REQ_OP_ZONE_FINISH:
> - return null_zone_mgmt(cmd, op, sector);
> + sts = null_zone_mgmt(cmd, op, sector);
> + break;
> default:
> - return null_process_cmd(cmd, op, sector, nr_sectors);
> + sts = null_process_cmd(cmd, op, sector, nr_sectors);
> }
> + spin_unlock_irq(>zone_lock);
> +
> + return sts;
> }
>
Looks good.
Reviewed-by: Damien Le Moal
--
Damien Le Moal
Western Digital Research
ts definition/semantic and propose it. But again, use a different
thread. This is mixing up zone-append and simple copy, which I do not think are
directly related.
> Not sure if I am clear, perhaps sending RFC would be better for
> discussion on simple-copy.
Separate this discussion from zone append please. Mixing up 2 problems in one
thread is not helpful to make progress.
--
Damien Le Moal
Western Digital Research
g a new
> spinlock for zoned device.
> Concurrent zone-appends (on a zone) returning same write-pointer issue
> is also avoided using this lock.
>
> Cc: sta...@vger.kernel.org
> Fixes: e0489ed5daeb ("null_blk: Support REQ_OP_ZONE_APPEND")
> Signed-off-by: Kanchan Joshi
>
On 2021/01/13 20:48, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 11:16:11AM +0000, Damien Le Moal wrote:
>> On 2021/01/13 19:25, Ming Lei wrote:
>>> On Wed, Jan 13, 2021 at 09:28:02AM +0000, Damien Le Moal wrote:
>>>> On 2021/01/13 18:19, Ming Lei wrote:
>>
atency is about 17ms including merge time too.
>
> 19ms looks too big just for preparing one 32MB sized bio, which isn't
> supposed to
> take so long. Can you investigate where the 19ms is taken just for
> preparing one
> 32MB sized bio?
Changheun mentioned that the device side IO latency is 16.7ms out of the 19ms
total. So the BIO handling, submission+completion takes about 2.3ms, and
Changheun points above to 2ms for the submission part.
>
> It might be iov_iter_get_pages() for handling page fault. If yes, one
> suggestion
> is to enable THP(Transparent HugePage Support) in your application.
But if that was due to page faults, the same large-ish time would be taken for
the preparing the size-limited BIOs too, no ? No matter how the BIOs are diced,
all 32MB of pages of the user IO are referenced...
>
>
--
Damien Le Moal
Western Digital Research
On 2021/01/13 19:25, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 09:28:02AM +0000, Damien Le Moal wrote:
>> On 2021/01/13 18:19, Ming Lei wrote:
>>> On Wed, Jan 13, 2021 at 12:09 PM Changheun Lee
>>> wrote:
>>>>
>>>>> On 2021/01/12 21:14, Chang
* We have to stop part way through an IO. We must fall
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index bec47f2d074b..c95ac37f9305 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -690,7 +690,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb,
> struct iov_iter *from)
> if (iocb->ki_flags & IOCB_DSYNC)
> bio->bi_opf |= REQ_FUA;
>
> - ret = bio_iov_iter_get_pages(bio, from);
> + ret = bio_iov_iter_get_pages(bio, from, is_sync_kiocb(iocb));
> if (unlikely(ret))
> goto out_release;
>
> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 676870b2c88d..fa3a503b955c 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -472,7 +472,7 @@ bool __bio_try_merge_page(struct bio *bio, struct page
> *page,
> unsigned int len, unsigned int off, bool *same_page);
> void __bio_add_page(struct bio *bio, struct page *page,
> unsigned int len, unsigned int off);
> -int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
> +int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter, bool
> sync);
> void bio_release_pages(struct bio *bio, bool mark_dirty);
> extern void bio_set_pages_dirty(struct bio *bio);
> extern void bio_check_pages_dirty(struct bio *bio);
>
>
> Thanks,
> Ming
>
>
--
Damien Le Moal
Western Digital Research
On 2021/01/26 15:07, Ming Lei wrote:
> On Tue, Jan 26, 2021 at 04:06:06AM +0000, Damien Le Moal wrote:
>> On 2021/01/26 12:58, Ming Lei wrote:
>>> On Tue, Jan 26, 2021 at 10:32:34AM +0900, Changheun Lee wrote:
>>>> bio size can grow up to 4GB when muli-page bvec
lk_queue_flag_test_and_set(unsigned int flag,
> struct request_queue *q);
> #define blk_queue_fua(q) test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags)
> #define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED,
> &(q)->queue_flags)
> #define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags)
> +#define blk_queue_limit_bio_size(q) \
> + test_bit(QUEUE_FLAG_LIMIT_BIO_SIZE, &(q)->queue_flags)
>
> extern void blk_set_pm_only(struct request_queue *q);
> extern void blk_clear_pm_only(struct request_queue *q);
>
--
Damien Le Moal
Western Digital Research
bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
> if (iocb->ki_flags & IOCB_DSYNC)
> bio->bi_opf |= REQ_FUA;
>
>
--
Damien Le Moal
Western Digital Research
1 - 100 of 240 matches
Mail list logo