On Tue, 2017-06-13 at 16:40 +0800, Eryu Guan wrote:
> On Mon, Jun 12, 2017 at 08:42:13AM -0400, Jeff Layton wrote:
> > Make a new btrfs/999 test that works the way Chris Mason suggested:
> >
> > Build a filesystem with 2 devices that stripes the data across
> > both devices, but mirrors metadata a
On 06/13/17 10:54, Ross Zwisler wrote:
> This commit is causing the following kernel BUG for me when I shut
> down my systems:
>
> BUG: sleeping function called from invalid context at
> kernel/workqueue.c:2790
> in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuop/3
Thanks Ross for the
Jens,
> A new iteration of this patchset, previously known as write streams.
> As before, this patchset aims at enabling applications split up
> writes into separate streams, based on the perceived life time
> of the data written. This is useful for a variety of reasons:
>
> - With NVMe 1.3 compl
On 06/14/2017 09:45 AM, Martin K. Petersen wrote:
>
> Jens,
>
>> A new iteration of this patchset, previously known as write streams.
>> As before, this patchset aims at enabling applications split up
>> writes into separate streams, based on the perceived life time
>> of the data written. This i
Jens,
> So how about we just call it "write_hint"? It sounds mostly like a
> naming issue to me, as you would then map that to some specific stream
> in your driver. You're free to do that right now. They are all flags,
> it's just packed as a value to not waste too much space.
Sure, that's fine
On Wed, Jun 14, 2017 at 09:53:05AM -0600, Jens Axboe wrote:
> So how about we just call it "write_hint"? It sounds mostly like a
> naming issue to me, as you would then map that to some specific stream
> in your driver. You're free to do that right now. They are all flags,
> it's just packed as a v
On 06/14/2017 10:01 AM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 09:53:05AM -0600, Jens Axboe wrote:
>> So how about we just call it "write_hint"? It sounds mostly like a
>> naming issue to me, as you would then map that to some specific stream
>> in your driver. You're free to do that ri
On 06/14/2017 10:00 AM, Martin K. Petersen wrote:
>
> Jens,
>
>> So how about we just call it "write_hint"? It sounds mostly like a
>> naming issue to me, as you would then map that to some specific stream
>> in your driver. You're free to do that right now. They are all flags,
>> it's just packe
Christoph,
> I think what Martin wants (or at least what I'd want him to want) is
> to define a few REQ_* bits that mirror the RWF bits, use that to
> transfer the information down the stack, and then only translate it
> to stream ids in the driver.
Yup. If we have enough space in the existing f
From: Shaohua Li
Currently blktrace isn't cgroup aware. blktrace prints out task name of
current context, but the task of current context isn't always in the
cgroup where the BIO comes from. We can't use task name to find out IO
cgroup. For example, Writeback BIOs always comes from flusher thread
From: Shaohua Li
Now we have the facilities to implement exportfs operations. The idea is
cgroup can export the fhandle info to userspace, then userspace uses
fhandle to find the cgroup name. Another example is userspace can get
fhandle for a cgroup and BPF uses the fhandle to filter info for the
From: Shaohua Li
bio_free isn't a good place to free cgroup/integrity info. There are a
lot of cases bio is allocated in special way (for example, in stack) and
never gets called by bio_put hence bio_free, we are leaking memory. This
patch moves the free to bio endio, which should be called anywa
From: Shaohua Li
By default we output cgroup id in blktrace. This adds an option to
display cgroup path. Since get cgroup path is a relativly heavy
operation, we don't enable it by default.
with the option enabled, blktrace will output something like this:
dd-1353 [007] d..2 293.015252: 8,0
From: Shaohua Li
Set i_generation for kernfs inode. This is required to implement exportfs
operations.
Note, the generation is 32-bit, so it's possible the generation wraps up
and we find stale files. The possiblity is low, since fhandle matches
both inode number and generation. In most fs, the
From: Shaohua Li
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.
Please note, I extend inode number to 64 bits
From: Shaohua Li
Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.
Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup' option enabled
From: Shaohua Li
kernfs uses ida to manage inode number. The problem is we can't get
kernfs_node from inode number with ida. Switching to use idr, next patch
will add an API to get kernfs_node from inode number.
Signed-off-by: Shaohua Li
---
fs/kernfs/dir.c| 17 -
inclu
From: Shaohua Li
blkcg_bio_issue_check() already gets blkcg for a BIO.
bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap
operation. There is no point we don't attach the cgroup info into bio at
blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup
info.
Signed
From: Shaohua Li
Add an API to get kernfs node from inode number. We will need this to
implement exportfs operations.
To make the API lock free, kernfs node is freed in RCU context. And we
depend on kernfs_node count/ino number to filter stale kernfs nodes.
Signed-off-by: Shaohua Li
---
fs/ke
From: Shaohua Li
Add an API to export cgroup fhandle info. We don't export a full 'struct
file_handle', there are unrequired info. Sepcifically, cgroup is always
a directory, so we don't need a 'FILEID_KERNFS_WITH_PARENT' type
fhandle, we only need export the inode number and generation number.
S
From: Shaohua Li
Hi,
Currently blktrace isn't cgroup aware. blktrace prints out task name of current
context, but the task of current context isn't always in the cgroup where the
BIO comes from. We can't use task name to find out IO cgroup. For example,
Writeback BIOs always comes from flusher t
From: Shaohua Li
When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata. inode->i_private
a
On 06/14/2017 10:04 AM, Martin K. Petersen wrote:
>
> Christoph,
>
>> I think what Martin wants (or at least what I'd want him to want) is
>> to define a few REQ_* bits that mirror the RWF bits, use that to
>> transfer the information down the stack, and then only translate it
>> to stream ids in
On 06/14/2017 10:04 AM, Martin K. Petersen wrote:
>
> Christoph,
>
>> I think what Martin wants (or at least what I'd want him to want) is
>> to define a few REQ_* bits that mirror the RWF bits, use that to
>> transfer the information down the stack, and then only translate it
>> to stream ids in
On Tue, 2017-06-13 at 23:47 -0700, Christoph Hellwig wrote:
> On Tue, Jun 13, 2017 at 06:24:32AM -0400, Jeff Layton wrote:
> > That's definitely what I want for the endgame here. My plan was to add
> > this flag for now, and then eventually reverse it (or drop it) once all
> > or most filesystems a
On Wed, Jun 14, 2017 at 9:19 AM, Bart Van Assche
wrote:
> On 06/13/17 10:54, Ross Zwisler wrote:
>> This commit is causing the following kernel BUG for me when I shut
>> down my systems:
>>
>> BUG: sleeping function called from invalid context at
>> kernel/workqueue.c:2790
>> in_atomic(): 1,
No functional changes in this patch, we just add four flags
that will be used to denote a stream type, and ensure that we
don't merge across different stream types.
Signed-off-by: Jens Axboe
---
block/blk-merge.c | 16
include/linux/blk_types.h | 11 +++
2 files
Add four flags for the pwritev2(2) system call, allowing an application
to give the kernel a hint about what on-media life times can be
expected from a given write.
The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.
Define IOCB
No functional changes in this patch, just in preparation for
allowing applications to pass in hints about data life times
for writes.
Pack the i_stream field into a 2-byte hole, so we don't grow
the size of the inode.
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/inode.c
A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:
- With NVMe 1.3 compliant devices, the d
Useful to verify that things are working the way they should.
Reading the file will return number of kb written to each
stream. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.
Drivers will write to q->stream_writes[] if they handle a stream.
R
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/xfs/xfs_aops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 09af0f7cd55e..9770be0140ad 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -505,6 +505,7 @@ xfs_submit_ioend(
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 2 ++
fs/direct-io.c | 2 ++
fs/iomap.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 51959936..31ba4a8f0a28 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -239,6
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/btrfs/extent_io.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d3619e010005..b245085e8f10 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2827,6 +2827,7 @@ stati
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/buffer.c | 14 +-
fs/mpage.c | 1 +
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 161be58c5cb0..8324c24751ca 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -49,7 +49,7 @@
This adds support for Directives in NVMe, particular for the Streams
directive. Support for Directives is a new feature in NVMe 1.3. It
allows a user to pass in information about where to store the data,
so that it the device can do so most effiently. If an application is
managing and writing data
We map the RWF_WRITE_* life time flags to the internal flags.
Drivers can then, in turn, map those flags to a suitable stream
type.
Signed-off-by: Jens Axboe
---
block/bio.c | 16
include/linux/bio.h | 1 +
include/linux/blk_types.h | 5 +
3 files chang
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/ext4/page-io.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 1a82138ba739..033b5bfa4e0b 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -350,6 +350,7 @@ void ext4_io_submit(
On 06/14/2017 09:19 AM, Bart Van Assche wrote:
> Subject: [PATCH] block: Fix a blk_exit_rl() regression
>
> Avoid that the following complaint is reported:
>
> BUG: sleeping function called from invalid context at kernel/workqueue.c:2790
> in_atomic(): 1, irqs_disabled(): 0, pid: 41, name: rcuo
On 06/14/17 12:28, Jens Axboe wrote:
> I added this, but the above is really a horrible changelog. It doesn't
> say how the problem is fixed. I added some verbiage to that effect.
Hello Jens,
Thanks for having fixed up the changelog and for already having picked
up this patch. I was going to repo
On Wed, Jun 14, 2017 at 01:05:26PM -0600, Jens Axboe wrote:
> No functional changes in this patch, just in preparation for
> allowing applications to pass in hints about data life times
> for writes.
>
> Pack the i_stream field into a 2-byte hole, so we don't grow
> the size of the inode.
Can't w
On Wed, Jun 14, 2017 at 01:05:27PM -0600, Jens Axboe wrote:
> Add four flags for the pwritev2(2) system call, allowing an application
> to give the kernel a hint about what on-media life times can be
> expected from a given write.
>
> The intent is for these values to be relative to each other, no
> +static const unsigned int rwf_write_to_opf_flag[] = {
> + 0, REQ_WRITE_SHORT, REQ_WRITE_MEDIUM, REQ_WRITE_LONG, REQ_WRITE_EXTREME
> +};
> +
> +/*
> + * 'stream_flags' is one of RWF_WRITE_LIFE_* values
> + */
> +void bio_set_streamid(struct bio *bio, unsigned int rwf_flags)
> +{
> + if
> +static unsigned int nvme_get_write_stream(struct nvme_ns *ns,
> + struct request *req)
> +{
> + unsigned int streamid = 0;
> +
> + if (req_op(req) != REQ_OP_WRITE || !blk_stream_valid(req->cmd_flags) ||
> + !ns->nr_streams)
> + re
On 06/14/2017 02:25 PM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 01:05:26PM -0600, Jens Axboe wrote:
>> No functional changes in this patch, just in preparation for
>> allowing applications to pass in hints about data life times
>> for writes.
>>
>> Pack the i_stream field into a 2-byte h
Btw, I think these could also easily map to DSM field in the NVMe
write command, except that these unfortunately mix in read information
as well.
> + __REQ_WRITE_SHORT, /* short life time write */
-> Frequent writes and infrequent reads to the LBA range indicated.
or
-> Frequent writes
On 06/14/2017 02:26 PM, Christoph Hellwig wrote:
> On Wed, Jun 14, 2017 at 01:05:27PM -0600, Jens Axboe wrote:
>> Add four flags for the pwritev2(2) system call, allowing an application
>> to give the kernel a hint about what on-media life times can be
>> expected from a given write.
>>
>> The inte
On 06/14/2017 02:28 PM, Christoph Hellwig wrote:
>> +static const unsigned int rwf_write_to_opf_flag[] = {
>> +0, REQ_WRITE_SHORT, REQ_WRITE_MEDIUM, REQ_WRITE_LONG, REQ_WRITE_EXTREME
>> +};
>
>
>
>> +
>> +/*
>> + * 'stream_flags' is one of RWF_WRITE_LIFE_* values
>> + */
>> +void bio_set_str
On 06/14/2017 02:32 PM, Christoph Hellwig wrote:
>> +static unsigned int nvme_get_write_stream(struct nvme_ns *ns,
>> + struct request *req)
>> +{
>> +unsigned int streamid = 0;
>> +
>> +if (req_op(req) != REQ_OP_WRITE || !blk_stream_valid(req->cmd_flags
On 06/14/2017 02:37 PM, Christoph Hellwig wrote:
> Btw, I think these could also easily map to DSM field in the NVMe
> write command, except that these unfortunately mix in read information
> as well.
But that's the problem, they are read/write mixed flags. I'd much
rather keep them separate. If s
On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
wrote:
> Christoph,
>
>> I think what Martin wants (or at least what I'd want him to want) is
>> to define a few REQ_* bits that mirror the RWF bits, use that to
>> transfer the information down the stack, and then only translate it
>> to stream i
File systems can encrypt some of their data blocks with their own
encryption keys, and for those blocks another round of encryption at
the dm-crypt layer may be redundant, depending on the keys being used.
This patch enables dm-crypt to observe the REQ_NOENCRYPT flag as an
indicator that a bio req
When lower layers such as dm-crypt observe the REQ_NOENCRYPT flag, it
helps the I/O stack avoid redundant encryption, improving performance
and power utilization.
Note that lower layers must be consistent in their observation of this
flag in order to avoid the possibility of data corruption.
Sign
When lower layers such as dm-crypt observe the REQ_NOENCRYPT flag, it
helps the I/O stack avoid redundant encryption, improving performance
and power utilization.
Note that lower layers must be consistent in their observation of this
flag in order to avoid the possibility of data corruption.
Sign
When both the file system and a lower layer such as dm-crypt encrypt
the same file contents, it impacts performance and power utilization.
Depending on how the operating environment manages the encryption
keys, there is often no significant security benefit to redundantly
encrypting.
File systems
Several file systems either have already implemented encryption or are
in the process of doing so. This addresses usability and storage
isolation requirements on mobile devices and in multi-tenant
environments.
While distinct keys locked down to user accounts protect the names and
contents of ind
On Wed, Jun 14, 2017 at 01:05:33PM -0600, Jens Axboe wrote:
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
Thanks Jens!
Signed-off-by: Chris Mason
---
fs/btrfs/extent_io.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d3619e010
On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
> On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
> wrote:
>> Christoph,
>>
>>> I think what Martin wants (or at least what I'd want him to want) is
>>> to define a few REQ_* bits that mirror the RWF bits, use that to
>>> transfer the infor
No functional changes in this patch, we just add four flags
that will be used to denote a stream type, and ensure that we
don't merge across different stream types.
Signed-off-by: Jens Axboe
---
block/blk-merge.c | 16
include/linux/blk_types.h | 11 +++
2 files
Useful to verify that things are working the way they should.
Reading the file will return number of kb written to each
stream. Writing the file will reset the statistics. No care
is taken to ensure that we don't race on updates.
Drivers will write to q->stream_writes[] if they handle a stream.
R
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/buffer.c | 14 +-
fs/mpage.c | 1 +
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 161be58c5cb0..3faf73a71d4b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -49,7 +49,7 @@
We map the RWF_WRITE_* life time flags to the internal flags.
Drivers can then, in turn, map those flags to a suitable stream
type.
Signed-off-by: Jens Axboe
---
block/bio.c | 16
include/linux/bio.h | 1 +
include/linux/blk_types.h | 5 +
3 files chang
Add four flags for the pwritev2(2) system call, allowing an application
to give the kernel a hint about what on-media life times can be
expected from a given write.
The intent is for these values to be relative to each other, no
absolute meaning should be attached to these flag names.
Define IOCB
A new iteration of this patchset, previously known as write streams.
As before, this patchset aims at enabling applications split up
writes into separate streams, based on the perceived life time
of the data written. This is useful for a variety of reasons:
- With NVMe 1.3 compliant devices, the d
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/block_dev.c | 2 ++
fs/direct-io.c | 2 ++
fs/iomap.c | 1 +
3 files changed, 5 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 51959936..de4301168710 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -239,6
No functional changes in this patch, just in preparation for
allowing applications to pass in hints about data life times
for writes.
Pack the i_write_hint field into a 2-byte hole, so we don't grow
the size of the inode.
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/inode.c
Reviewed-by: Andreas Dilger
Signed-off-by: Chris Mason
Signed-off-by: Jens Axboe
---
fs/btrfs/extent_io.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d3619e010005..2bc2dfca87c2 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/ext4/page-io.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 1a82138ba739..764bf0ddecd4 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -349,6 +349,7 @@ void ext4_io_submit(
Reviewed-by: Andreas Dilger
Signed-off-by: Jens Axboe
---
fs/xfs/xfs_aops.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 09af0f7cd55e..fe11fe47d235 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -505,6 +505,7 @@ xfs_submit_ioend(
This adds support for Directives in NVMe, particular for the Streams
directive. Support for Directives is a new feature in NVMe 1.3. It
allows a user to pass in information about where to store the data,
so that it the device can do so most effiently. If an application is
managing and writing data
On 06/14/2017 09:53 PM, Andreas Dilger wrote:
> On Jun 14, 2017, at 9:26 PM, Jens Axboe wrote:
>> On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
>>> On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
>>> wrote:
Christoph,
> I think what Martin wants (or at least what I'd wan
On Jun 14, 2017, at 9:26 PM, Jens Axboe wrote:
> On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
>> On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
>> wrote:
>>> Christoph,
>>>
I think what Martin wants (or at least what I'd want him to want) is
to define a few REQ_* bits tha
On Wed, Jun 14, 2017 at 09:45:05PM -0600, Jens Axboe wrote:
> Add four flags for the pwritev2(2) system call, allowing an application
> to give the kernel a hint about what on-media life times can be
> expected from a given write.
>
> The intent is for these values to be relative to each other, no
On 06/14/2017 10:15 PM, Darrick J. Wong wrote:
>> diff --git a/fs/read_write.c b/fs/read_write.c
>> index 47c1d4484df9..9cb2314efca3 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -678,7 +678,7 @@ static ssize_t do_iter_readv_writev(struct file *filp,
>> struct iov_iter *iter,
>>
On 06/14/2017 09:57 PM, Jens Axboe wrote:
> On 06/14/2017 09:53 PM, Andreas Dilger wrote:
>> On Jun 14, 2017, at 9:26 PM, Jens Axboe wrote:
>>> On Wed, Jun 14, 2017 at 5:39 PM, Andreas Dilger wrote:
On Jun 14, 2017, at 10:04 AM, Martin K. Petersen
wrote:
> Christoph,
>
>>
On Wed, Jun 14, 2017 at 07:55:17AM -0400, Jeff Layton wrote:
> On Tue, 2017-06-13 at 16:40 +0800, Eryu Guan wrote:
> > On Mon, Jun 12, 2017 at 08:42:13AM -0400, Jeff Layton wrote:
> > > Make a new btrfs/999 test that works the way Chris Mason suggested:
> > >
> > > Build a filesystem with 2 device
76 matches
Mail list logo