[GIT PULL] MD update for 4.20

2018-10-26 Thread Shaohua Li
be failed NeilBrown (1): md: allow metadata updates while suspending an array - fix Shaohua Li (2): MD: fix invalid stored role for a disk MD: fix invalid stored role for a disk - try2 Xiao Ni (1): MD: Memory leak when flush bio size is zero drivers/md/md-bitmap.c

Re: [LKP] [MD] d595567dc4: mdadm-selftests.02lineargrow.fail

2018-10-14 Thread Shaohua Li
On Fri, Oct 12, 2018 at 04:58:44PM +0800, kernel test robot wrote: > FYI, we noticed the following commit (built with gcc-7): > > commit: d595567dc4f0c1d90685ec1e2e296e2cad2643ac ("MD: fix invalid stored > role for a disk") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

Re: [PATCH] md/bitmap: use mddev_suspend/resume instead of ->quiesce()

2018-09-28 Thread Shaohua Li
On Thu, Sep 27, 2018 at 10:07:57AM +0200, Jack Wang wrote: > From: Jack Wang > > After 9e1cc0a54556 ("md: use mddev_suspend/resume instead of ->quiesce()") > We still have similar left in bitmap functions. > > Replace quiesce() with mddev_suspend/resume. > > Also move md_bitmap_create out of md

[GIT PULL] MD update for 4.19-rc2

2018-09-07 Thread Shaohua Li
bad: md-cluster: release RESYNC lock after the last resync message (2018-08-31 17:38:10 -0700) Guoqing Jiang (1): md-cluster: release RESYNC lock after the last resync message Shaohua Li (1): md/raid5-cache: disable reshape

[GIT PULL] MD update for 4.19-rc1

2018-08-13 Thread Shaohua Li
Hi, A few MD fixes for 4.19-rc1: - Several md-cluster fixes from Guoqing - A data corruption fix from BingJing - Other cleanups Please pull! Thanks, Shaohua The following changes since commit 06c85639897cf3ea6a11c5cb6929fb0d9d7efea5: Merge tag 'acpi-4.18-rc4' of git://git.kernel.org/pub/scm/l

Re: [PATCH 1/6] drivers/md/raid5: Use irqsave variant of atomic_dec_and_lock()

2018-07-18 Thread Shaohua Li
On Wed, Jul 18, 2018 at 12:57:21PM +0200, Sebastian Andrzej Siewior wrote: > On 2018-07-16 17:37:27 [-0700], Shaohua Li wrote: > > On Mon, Jul 16, 2018 at 02:27:40PM +0200, Sebastian Andrzej Siewior wrote: > > > On 2018-07-03 22:01:36 [+0200], To linux-kernel@vger.kernel.org w

Re: [PATCH 1/6] drivers/md/raid5: Use irqsave variant of atomic_dec_and_lock()

2018-07-16 Thread Shaohua Li
ing/releasing the spin lock. With this variant the call of > > local_irq_save is no longer required. > > Shaohua, are you with this? Acked-by: Shaohua Li > > Cc: Shaohua Li > > Cc: linux-r...@vger.kernel.org > > Acked-by: Peter Zijlstra (Intel) > > Signed-of

[GIT PULL] MD update for 4.18-rc3

2018-07-02 Thread Shaohua Li
2018-06-28 13:04:49 -0700) BingJing Chang (1): md/raid10: fix that replacement cannot complete recovery after reassemble Shaohua Li (1): MD: cleanup resources in failure drivers/md/md.c | 8 +--- drivers/md/ra

[GIT PULL] MD update for 4.18-rc

2018-06-09 Thread Shaohua Li
Hi, A few fixes of MD for this merge window. Mostly bug fixes: - raid5 stripe batch fix from Amy - Read error handling for raid1 FailFast device from Gioh - raid10 recovery NULL pointer dereference fix from Guoqing - Support write hint for raid5 stripe cache from Mariusz - Fixes for device hot add/

Re: [PATCH 3/8] md: raid5: use refcount_t for reference counting instead atomic_t

2018-05-23 Thread Shaohua Li
On Wed, May 23, 2018 at 07:49:04PM +0200, Peter Zijlstra wrote: > On Wed, May 23, 2018 at 06:21:19AM -0700, Matthew Wilcox wrote: > > On Wed, May 09, 2018 at 09:36:40PM +0200, Sebastian Andrzej Siewior wrote: > > > refcount_t type and corresponding API should be used instead of atomic_t > > > when

Re: [PATCH] md: zero the original position of sb for 0.90 and 1.0

2018-05-17 Thread Shaohua Li
On Wed, May 16, 2018 at 05:18:39PM +0800, Jianchao Wang wrote: > For sb version 0.90 and 1.0 which locates after data, when we increase > the spindle volume size and grow the raid arry size, the older sb which is > different between spindles will be left there. Due to this left sb, the > spindle vo

Re: [PATCH] md/raid1: add error handling of read error from FailFast device

2018-05-14 Thread Shaohua Li
On Wed, May 02, 2018 at 01:08:11PM +0200, Gioh Kim wrote: > Current handle_read_error() function calls fix_read_error() > only if md device is RW and rdev does not include FailFast flag. > It does not handle a read error from a RW device including > FailFast flag. > > I am not sure it is intended.

[GIT PULL] MD update for 4.17-rc1

2018-04-19 Thread Shaohua Li
Hi, 3 small fixes for MD: - md-cluster fix for faulty device from Guoqing - writehint fix for writebehind IO for raid1 from Mariusz - a live lock fix for interrupted recovery from Yufen Please pull! The following changes since commit f8cf2f16a7c95acce497bfafa90e7c6d8397d653: Merge branch 'next

[GIT PULL] MD update for 4.16-rc3

2018-03-01 Thread Shaohua Li
Hi, A few bug fixes for MD, please pull: - Fix raid5-ppl flush request handling hang from Artur - Fix a potential deadlock in raid5/10 reshape from BingJing - Fix a deadlock for dm-raid from Heinz - Fix two md-cluster of raid10 from Lidong and Guoqing - Fix a NULL deference problem in device remova

Re: [PATCH] md: raid5: avoid string overflow warning

2018-02-21 Thread Shaohua Li
On Tue, Feb 20, 2018 at 02:09:11PM +0100, Arnd Bergmann wrote: > gcc warns about a possible overflow of the kmem_cache string, when adding > four characters to a string of the same length: > > drivers/md/raid5.c: In function 'setup_conf': > drivers/md/raid5.c:2207:34: error: '-alt' directive writi

Re: [PATCH] md: fix md_write_start() deadlock w/o metadata devices

2018-02-18 Thread Shaohua Li
On Fri, Feb 02, 2018 at 11:13:19PM +0100, Heinz Mauelshagen wrote: > If no metadata devices are configured on raid1/4/5/6/10 > (e.g. via dm-raid), md_write_start() unconditionally waits > for superblocks to be written thus deadlocking. > > Fix introduces mddev->has_superblocks bool, defines it in

Re: [PATCH v2] md-multipath: Use seq_putc() in multipath_status()

2018-02-17 Thread Shaohua Li
On Sat, Jan 13, 2018 at 09:55:08AM +0100, SF Markus Elfring wrote: > From: Markus Elfring > Date: Sat, 13 Jan 2018 09:49:03 +0100 > > A single character (closing square bracket) should be put into a sequence. > Thus use the corresponding function "seq_putc". > > This issue was detected by using

Re: [PATCH] md/raid1: Fix trailing semicolon

2018-02-17 Thread Shaohua Li
On Wed, Jan 17, 2018 at 01:38:02PM +, Luis de Bethencourt wrote: > The trailing semicolon is an empty statement that does no operation. > Removing it since it doesn't do anything. > > Signed-off-by: Luis de Bethencourt > --- > > Hi, > > After fixing the same thing in drivers/staging/rtl8723

[GIT PULL] MD update for 4.16-rc1

2018-01-30 Thread Shaohua Li
Hi, Some small fixes for MD 4.16: - Fix raid5-cache potential problems if raid5 cache isn't fully recovered - Fix a wait-within-wait warning in raid1/10 - Make raid5-PPL support disks with writeback cache enabled Please pull! Thanks, Shaohua The following changes since commit 50c4c4e268a2d7a3e58e

Re: Regression with a0747a859ef6 ("bdi: add error handle for bdi_debug_register")

2017-12-19 Thread Shaohua Li
On Tue, Dec 19, 2017 at 10:17:43AM -0600, Bruno Wolff III wrote: > On Sun, Dec 17, 2017 at 21:43:50 +0800, > weiping zhang wrote: > > Hi, thanks for testing, I think you first reproduce this issue(got WARNING > > at device_add_disk) by your own build, then add my debug patch. > > The problem is

[GIT PULL] MD update for 4.15-rc2

2017-12-07 Thread Shaohua Li
9:48 -0800) Nate Dailey (1): md: limit mdstat resync progress to max_sectors Shaohua Li (1): md/raid1/10: add missed blk plug Song Liu (1): md/r5cache: move mddev_lock() out of r5c_journal_mode_set() bingjingc (1)

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-17 Thread Shaohua Li
On Thu, Nov 16, 2017 at 08:25:58PM -0800, Khazhismel Kumykov wrote: > On Thu, Nov 16, 2017 at 8:50 AM, Shaohua Li wrote: > > On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote: > >> Allows configuration additional bytes or ios before a throttle is > >>

Re: [RFC PATCH] blk-throttle: add burst allowance.

2017-11-16 Thread Shaohua Li
On Tue, Nov 14, 2017 at 03:10:22PM -0800, Khazhismel Kumykov wrote: > Allows configuration additional bytes or ios before a throttle is > triggered. > > This allows implementation of a bucket style rate-limit/throttle on a > block device. Previously, bursting to a device was limited to allowance >

Re: [PATCH 6/7] blkcg: account requests instead of bios for request based request_queues

2017-11-14 Thread Shaohua Li
roup.h > @@ -715,7 +715,8 @@ static inline bool blkcg_bio_issue_check(struct > request_queue *q, > > throtl = blk_throtl_bio(q, blkg, bio); > > - if (!throtl) { > + /* if @q does io stat, blkcg stats are updated together with them */ > + if (!blk_queu

Re: [PATCH v2 3/7] blkcg: associate a request with its blkcg_gq instead of request_list

2017-11-14 Thread Shaohua Li
On Mon, Nov 13, 2017 at 12:15:23PM -0800, Tejun Heo wrote: > From c856a199ec70e4022e997609f1b17b9106408777 Mon Sep 17 00:00:00 2001 > From: Tejun Heo > Date: Mon, 13 Nov 2017 12:11:57 -0800 > > On the legacy request_queue, each blkcg_gq has a dedicated > request_list that requests for the cgroup

Re: [PATCH 4/7] blkcg: refactor blkcg_gq lookup and creation in blkcg_bio_issue_check()

2017-11-14 Thread Shaohua Li
and doesn't introduce any functional changes. > > Signed-off-by: Tejun Heo Reviewed-by: Shaohua Li > --- > block/blk-cgroup.c | 6 +++--- > block/blk-throttle.c | 2 +- > include/linux/blk-cgroup.h | 32 +--- > 3 files changed,

Re: [PATCH 2/7] blkcg: use percpu_ref for blkcg_gq->refcnt

2017-11-14 Thread Shaohua Li
the use of atomic_t > acceptable albeit restrictive and fragile. > > Now that the percpu allocator supports !GFP_KERNEL allocations, > there's no reason to keep using atomic_t refcnt. This will allow > clean separation between bio and request layers helping blkcg sup

Re: [PATCH 1/7] blkcg: relocate __blkg_release_rcu()

2017-11-14 Thread Shaohua Li
On Sun, Nov 12, 2017 at 02:26:07PM -0800, Tejun Heo wrote: > Move __blkg_release_rcu() above blkg_alloc(). This is a pure code > reorganization to prepare for the switch to percpu_ref. > > Signed-off-by: Tejun Heo Reviewed-by: Shaohua Li > --- > block

Re: [PATCH 2/2] blk-throtl: add relative percentage support to latency=

2017-11-14 Thread Shaohua Li
ops=100 wiops=100 idle=100 > latency=120% > > Signed-off-by: Tejun Heo > Cc: Shaohua Li Reviewed-by: Shaohua Li > --- > block/blk-throttle.c | 66 > +++ > 1 file changed, 51 insertions(+), 15 deletions(-) >

[GIT PULL] MD update for 4.15-rc1

2017-11-14 Thread Shaohua Li
ove special meaning of ->quiesce(.., 2) md: be cautious about using ->curr_resync_completed for ->recovery_offset Shaohua Li (2): md/bitmap: revert a patch md: use lockdep_assert_held Zdenek Kabelac (2): md: release allocated bitset sync_set md: free unused m

Re: [PATCH 1/2] blk-throtl: make latency= absolute

2017-11-13 Thread Shaohua Li
On Mon, Nov 13, 2017 at 06:18:49AM -0800, Tejun Heo wrote: > Hello, Shaohua. Just a bit of addition. > > On Mon, Nov 13, 2017 at 03:27:10AM -0800, Tejun Heo wrote: > > What I'm trying to say is that the latency is defined as "from bio > > issue to completion", not "in-flight time on device". Whe

Re: [PATCH 7/7] blk-throtl: don't throttle the same IO multiple times

2017-11-13 Thread Shaohua Li
On Mon, Nov 13, 2017 at 07:57:45AM -0800, Tejun Heo wrote: > Hello, > > On Mon, Nov 13, 2017 at 03:13:48AM -0800, Tejun Heo wrote: > > You're right. If we wanna take this approach, we need to keep the > > throttled flag while cloning. The clearing part is still correct tho. > > Without that, I g

Re: [PATCH 7/7] blk-throtl: don't throttle the same IO multiple times

2017-11-13 Thread Shaohua Li
On Mon, Nov 13, 2017 at 07:57:45AM -0800, Tejun Heo wrote: > Hello, > > On Mon, Nov 13, 2017 at 03:13:48AM -0800, Tejun Heo wrote: > > You're right. If we wanna take this approach, we need to keep the > > throttled flag while cloning. The clearing part is still correct tho. > > Without that, I g

Re: [PATCH 1/2] blk-throtl: make latency= absolute

2017-11-12 Thread Shaohua Li
On Fri, Nov 10, 2017 at 07:43:14AM -0800, Tejun Heo wrote: > Hello, Shaohua. > > On Thu, Nov 09, 2017 at 08:27:13PM -0800, Shaohua Li wrote: > > I think the absolute latency would only work for HD. For a SSD, a 4k latency > > probably is 60us and 1M latency is 500us.

Re: [PATCH 7/7] blk-throtl: don't throttle the same IO multiple times

2017-11-12 Thread Shaohua Li
On Sun, Nov 12, 2017 at 02:26:13PM -0800, Tejun Heo wrote: > BIO_THROTTLED is used to mark already throttled bios so that a bio > doesn't get throttled multiple times. The flag gets set when the bio > starts getting dispatched from blk-throtl and cleared when it leaves > blk-throtl. > > Unfortuna

Re: [PATCH 1/2] blk-throtl: make latency= absolute

2017-11-09 Thread Shaohua Li
On Thu, Nov 09, 2017 at 03:42:58PM -0800, Tejun Heo wrote: > Hello, Shaohua. > > On Thu, Nov 09, 2017 at 03:12:12PM -0800, Shaohua Li wrote: > > The percentage latency makes sense, but the absolute latency doesn't to me. > > A > > 4k IO latency could be much s

Re: [PATCH 1/2] blk-throtl: make latency= absolute

2017-11-09 Thread Shaohua Li
tency could be much smaller than 1M IO latency. If we don't add baseline latency, we can't specify a latency target which works for both 4k and 1M IO. Thanks, Shaohua > Signed-off-by: Tejun Heo > Cc: Shaohua Li > --- > block/blk-throttle.c |3 +-- > 1 file changed,

[PATCH V2] kthread: zero the kthread data structure

2017-11-07 Thread Shaohua Li
ch doesn't sound much overhead. Reported-by: syzbot Fixes: 05e3db95ebfc ("kthread: add a mechanism to store cgroup info") Cc: Andrew Morton Cc: Ingo Molnar Cc: Tejun Heo Cc: Dmitry Vyukov Signed-off-by: Shaohua Li --- kernel/kthread.c | 6 +- 1 file changed, 1 insertion(+), 5

[PATCH] kthread: move the cgroup info initialization early

2017-11-07 Thread Shaohua Li
: Tejun Heo Signed-off-by: Shaohua Li --- kernel/kthread.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/kthread.c b/kernel/kthread.c index f87cd8b4..cf5c113 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -205,6 +205,10 @@ static int kthread(void

Re: [PATCH] md: Convert timers to use timer_setup()

2017-10-16 Thread Shaohua Li
ong, I'll apply. Or if you want this merged in other tree, you can add my 'reviewed-by: Shaohua Li ' for md.c part. > Cc: Kent Overstreet > Cc: Shaohua Li > Cc: Alasdair Kergon > Cc: Mike Snitzer > Cc: dm-de...@redhat.com > Cc: linux-bca...@vger.kernel.org > C

Re: [PATCH] md: raid10: remove VLAIS

2017-10-05 Thread Shaohua Li
On Fri, Oct 06, 2017 at 01:22:12PM +1100, Neil Brown wrote: > On Thu, Oct 05 2017, Matthias Kaehlcke wrote: > > > Hi Neil, > > > > El Fri, Oct 06, 2017 at 10:58:59AM +1100 NeilBrown ha dit: > > > >> On Thu, Oct 05 2017, Matthias Kaehlcke wrote: > >> > >> > The raid10 driver can't be built with cl

Re: [PATCH] md: raid10: remove VLAIS

2017-10-05 Thread Shaohua Li
On Thu, Oct 05, 2017 at 11:28:47AM -0700, Matthias Kaehlcke wrote: > The raid10 driver can't be built with clang since it uses a variable > length array in a structure (VLAIS): > > drivers/md/raid10.c:4583:17: error: fields must have a constant size: > 'variable length array in structure' extens

[GIT PULL] MD update for 4.14-rc3

2017-09-29 Thread Shaohua Li
t (2017-09-27 20:08:44 -0700) -------- Shaohua Li (4): md: separate request handling md: fix a race condition for flush request handling dm-raid: fix a race condition in request handling md/raid5: cap worker count dri

Re: [PATCH V3 0/4] block: make loop block device cgroup aware

2017-09-25 Thread Shaohua Li
On Thu, Sep 14, 2017 at 02:02:03PM -0700, Shaohua Li wrote: > From: Shaohua Li > > Hi, > > The IO dispatched to under layer disk by loop block device isn't cloned from > original bio, so the IO loses cgroup information of original bio. These IO > escapes from cgroup c

Re: MADV_FREE is broken

2017-09-20 Thread Shaohua Li
On Wed, Sep 20, 2017 at 11:01:47AM +0200, Artem Savkov wrote: > Hi All, > > We recently started noticing madvise09[1] test from ltp failing strangely. The > test does the following: maps 32 pages, sets MADV_FREE for the range it got, > dirties 2 of the pages, creates memory pressure and check that

[GIT PULL] MD update for 4.14-rc2

2017-09-19 Thread Shaohua Li
) Dennis Yang (1): md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list Shaohua Li (1): md/raid5: fix a race condition in stripe batch drivers/md/raid5.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-)

[PATCH V3 1/4] kthread: add a mechanism to store cgroup info

2017-09-14 Thread Shaohua Li
From: Shaohua Li kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread

[PATCH V3 0/4] block: make loop block device cgroup aware

2017-09-14 Thread Shaohua Li
From: Shaohua Li Hi, The IO dispatched to under layer disk by loop block device isn't cloned from original bio, so the IO loses cgroup information of original bio. These IO escapes from cgroup control. The patches try to address this issue. The idea is quite generic, but we currently only

[PATCH V3 2/4] blkcg: delete unused APIs

2017-09-14 Thread Shaohua Li
From: Shaohua Li Nobody uses the APIs right now. Acked-by: Tejun Heo Signed-off-by: Shaohua Li --- block/bio.c| 31 --- include/linux/bio.h| 2 -- include/linux/blk-cgroup.h | 12 3 files changed, 45 deletions(-) diff --git a

[PATCH V3 4/4] block/loop: make loop cgroup aware

2017-09-14 Thread Shaohua Li
From: Shaohua Li loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgro

[PATCH V3 3/4] block: make blkcg aware of kthread stored original cgroup info

2017-09-14 Thread Shaohua Li
From: Shaohua Li bio_blkcg is the only API to get cgroup info for a bio right now. If bio_blkcg finds current task is a kthread and has original blkcg associated, it will use the css instead of associating the bio to current task. This makes it possible that kthread dispatches bios on behalf of

Re: [PATCH V2 1/4] kthread: add a mechanism to store cgroup info

2017-09-13 Thread Shaohua Li
On Wed, Sep 13, 2017 at 02:38:20PM -0700, Tejun Heo wrote: > Hello, > > On Wed, Sep 13, 2017 at 02:01:26PM -0700, Shaohua Li wrote: > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > index 26db528..3107eee 100644 > > --- a/kernel/kthread.c > > +++ b/ke

[PATCH V2 2/4] blkcg: delete unused APIs

2017-09-13 Thread Shaohua Li
From: Shaohua Li Nobody uses the APIs right now. Signed-off-by: Shaohua Li --- block/bio.c| 31 --- include/linux/bio.h| 2 -- include/linux/blk-cgroup.h | 12 3 files changed, 45 deletions(-) diff --git a/block/bio.c b/block

[PATCH V2 1/4] kthread: add a mechanism to store cgroup info

2017-09-13 Thread Shaohua Li
From: Shaohua Li kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread

[PATCH V2 3/4] block: make blkcg aware of kthread stored original cgroup info

2017-09-13 Thread Shaohua Li
From: Shaohua Li bio_blkcg is the only API to get cgroup info for a bio right now. If bio_blkcg finds current task is a kthread and has original blkcg associated, it will use the css instead of associating the bio to current task. This makes it possible that kthread dispatches bios on behalf of

[PATCH V2 0/4] block: make loop block device cgroup aware

2017-09-13 Thread Shaohua Li
From: Shaohua Li Hi, The IO dispatched to under layer disk by loop block device isn't cloned from original bio, so the IO loses cgroup information of original bio. These IO escapes from cgroup control. The patches try to address this issue. The idea is quite generic, but we currently only

[PATCH V2 4/4] block/loop: make loop cgroup aware

2017-09-13 Thread Shaohua Li
From: Shaohua Li loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgro

Re: [PATCH 3/3] block/loop: make loop cgroup aware

2017-09-08 Thread Shaohua Li
On Fri, Sep 08, 2017 at 07:48:09AM -0700, Tejun Heo wrote: > Hello, > > On Wed, Sep 06, 2017 at 07:00:53PM -0700, Shaohua Li wrote: > > diff --git a/drivers/block/loop.c b/drivers/block/loop.c > > index 9d4545f..9850b27 100644 > > --- a/drivers/block/loop.c >

Re: [PATCH 1/3] kthread: add a mechanism to store cgroup info

2017-09-08 Thread Shaohua Li
On Fri, Sep 08, 2017 at 07:35:37AM -0700, Tejun Heo wrote: > Hello, > > On Wed, Sep 06, 2017 at 07:00:51PM -0700, Shaohua Li wrote: > > +#ifdef CONFIG_CGROUPS > > +void kthread_set_orig_css(struct cgroup_subsys_state *css); > > +struct cgroup_subsys_state *kthread_ge

[PATCH 1/3] kthread: add a mechanism to store cgroup info

2017-09-06 Thread Shaohua Li
From: Shaohua Li kthread usually runs jobs on behalf of other threads. The jobs should be charged to cgroup of original threads. But the jobs run in a kthread, where we lose the cgroup context of original threads. The patch adds a machanism to record cgroup info of original threads in kthread

[PATCH 0/3] block: make loop block device cgroup aware

2017-09-06 Thread Shaohua Li
From: Shaohua Li Hi, The IO dispatched to under layer disk by loop block device isn't cloned from original bio, so the IO loses cgroup information of original bio. These IO escapes from cgroup control. The patches try to address this issue. The idea is quite generic, but we currently only

[PATCH 2/3] block: make blkcg aware of kthread stored original cgroup info

2017-09-06 Thread Shaohua Li
From: Shaohua Li Several blkcg APIs are deprecated. After removing them, bio_blkcg is the only API to get cgroup info for a bio. If bio_blkcg finds current task is a kthread and has original css recorded, it will use the css instead of associating the bio to current task. Signed-off-by: Shaohua

[PATCH 3/3] block/loop: make loop cgroup aware

2017-09-06 Thread Shaohua Li
From: Shaohua Li loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgro

[GIT PULL] MD update for 4.14-rc1

2017-09-06 Thread Shaohua Li
o 512 bits, not bytes Guoqing Jiang (1): raid5: remove raid5_build_block NeilBrown (1): md/bitmap: disable bitmap_resize for file-backed bitmaps. Pawel Baldysiak (2): md: Runtime support for multiple ppls raid5-ppl: Recovery support for multiple partial parity logs Shaohua

Re: [PATCH V6 00/18] blk-throttle: add .low limit

2017-09-06 Thread Shaohua Li
On Wed, Sep 06, 2017 at 09:12:20AM +0800, Joseph Qi wrote: > Hi Shaohua, > > On 17/9/6 05:02, Shaohua Li wrote: > > On Thu, Aug 31, 2017 at 09:24:23AM +0200, Paolo VALENTE wrote: > >> > >>> Il giorno 15 gen 2017, alle ore 04:42, Shaohua Li ha > >>

Re: [PATCH v2] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Shaohua Li
On Wed, Sep 06, 2017 at 11:02:35AM +0800, Dennis Yang wrote: > In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST > set, it indicates that this stripe_head is already in the raid5_plug_cb > list and release_stripe() would be called instead to drop a reference > count. Otherwis

Re: [PATCH V6 00/18] blk-throttle: add .low limit

2017-09-05 Thread Shaohua Li
On Thu, Aug 31, 2017 at 09:24:23AM +0200, Paolo VALENTE wrote: > > > Il giorno 15 gen 2017, alle ore 04:42, Shaohua Li ha scritto: > > > > Hi, > > > > cgroup still lacks a good iocontroller. CFQ works well for hard disk, but > > not > > much for

Re: [PATCH] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Shaohua Li
On Fri, Sep 01, 2017 at 05:26:48PM +0800, Dennis Yang wrote: > >On Mon, Aug 28, 2017 at 08:01:59PM +0800, Dennis Yang wrote: > >> break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST which is > >> set when a stripe_head gets queued to the stripe_head list maintained by > >> raid5_plug_c

Re: [PATCH] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-08-31 Thread Shaohua Li
On Mon, Aug 28, 2017 at 08:01:59PM +0800, Dennis Yang wrote: > break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST which is > set when a stripe_head gets queued to the stripe_head list maintained by > raid5_plug_cb and waiting for releasing after blk_unplug(). > > In release_stripe_pl

[PATCH] kthread_worker: don't hog the cpu

2017-08-25 Thread Shaohua Li
If the worker thread continues getting work, it will hog the cpu and rcu stall complains. Make it a good citizen. This is triggered in a loop block device test. Signed-off-by: Shaohua Li --- kernel/kthread.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/kthread.c b/kernel/kthread.c

[GIT PULL] MD update for 4.13-rc5

2017-08-14 Thread Shaohua Li
700) NeilBrown (2): md: always clear ->safemode when md_check_recovery gets the mddev lock. md: fix test in md_write_start() Shaohua Li (1): MD: not clear ->safemode for external metadata array Song Liu (2): m

Re: [PATCH] lib/raid6: align AVX512 constants to 512 bits, not bytes

2017-08-14 Thread Shaohua Li
On Sat, Aug 12, 2017 at 07:43:46PM +0200, Denys Vlasenko wrote: > Signed-off-by: Denys Vlasenko > Cc: H. Peter Anvin > Cc: mi...@redhat.com > Cc: Jim Kukunas > Cc: Fenghua Yu > Cc: Megha Dey > Cc: Gayatri Kammela > Cc: Shaohua Li > Cc: x...@kernel.org > Cc

Re: [MD] Crash with 4.12+ kernel and high disk load -- bisected to 4ad23a976413: MD: use per-cpu counter for writes_pending

2017-08-07 Thread Shaohua Li
On Mon, Aug 07, 2017 at 01:20:25PM +0200, Dominik Brodowski wrote: > Neil, Shaohua, > > following up on David R's bug message: I have observed something similar > on v4.12.[345] and v4.13-rc4, but not on v4.11. This is a RAID1 (on bare > metal partitions, /dev/sdaX and /dev/sdbY linked together).

Re: [PATCH] md: replace seq_release_private with seq_release

2017-07-31 Thread Shaohua Li
On Sat, Jul 29, 2017 at 07:52:45PM +0300, Cihangir Akturk wrote: > Since commit f15146380d28 ("fs: seq_file - add event counter to simplify > poll() support"), md.c code has been no longer used the private field of > the struct seq_file, but seq_release_private() has been continued to be > used to

[GIT PULL] MD update for 4.13-rc3

2017-07-28 Thread Shaohua Li
md: raid1/raid10: initialize bvec table via bio_add_page() md: raid1-10: move raid1/raid10 common code into raid1-10.c Ofer Heifetz (1): md/raid5: add thread_group worker async_tx_issue_pending_all Shaohua Li (2): md/raid1: fix writebehind bio clone MD: fix warnning

[GIT PULL] MD update for 4.13-rc2

2017-07-18 Thread Shaohua Li
Hi, Please pull 3 MD fixes: - raid5-ppl fix by Artur. This one is introduced in this release cycle. - raid5 reshape fix by Xiao. This is an old bug and will be added to stable. - Bitmap fix by Guoqing. Thanks, Shaohua The following changes since commit af3c8d98508d37541d4bf57f13a984a7f73a328c:

[PATCH V5 04/11] kernfs: don't set dentry->d_fsdata

2017-07-12 Thread Shaohua Li
From: Shaohua Li When working on adding exportfs operations in kernfs, I found it's hard to initialize dentry->d_fsdata in the exportfs operations. Looks there is no way to do it without race condition. Look at the kernfs code closely, there is no point to set dentry->d_fsdata. inode

[PATCH V5 01/11] kernfs: use idr instead of ida to manage inode number

2017-07-12 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo Acked-by: Greg Kroah-Hartman Signed-off-by: Shaoh

[PATCH V5 00/11] blktrace: output cgroup info

2017-07-12 Thread Shaohua Li
From: Shaohua Li Hi, Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V5 03/11] kernfs: add an API to get kernfs node from inode number

2017-07-12 Thread Shaohua Li
From: Shaohua Li Add an API to get kernfs node from inode number. We will need this to implement exportfs operations. This API will be used in blktrace too later, so it should be as fast as possible. To make the API lock free, kernfs node is freed in RCU context. And we depend on kernfs_node

[PATCH V5 02/11] kernfs: implement i_generation

2017-07-12 Thread Shaohua Li
From: Shaohua Li Set i_generation for kernfs inode. This is required to implement exportfs operations. The generation is 32-bit, so it's possible the generation wraps up and we find stale files. To reduce the posssibility, we don't reuse inode numer immediately. When the inode number

[PATCH V5 06/11] kernfs: add exportfs operations

2017-07-12 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V5 10/11] blktrace: add an option to allow displaying cgroup path

2017-07-12 Thread Shaohua Li
From: Shaohua Li By default we output cgroup id in blktrace. This adds an option to display cgroup path. Since get cgroup path is a relativly heavy operation, we don't enable it by default. with the option enabled, blktrace will output something like this: dd-1353 [007] d..2 293.0

[PATCH V5 05/11] kernfs: introduce kernfs_node_id

2017-07-12 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Acked-by: Greg Kroah-Hartman Signed-o

[PATCH V5 09/11] block: always attach cgroup info into bio

2017-07-12 Thread Shaohua Li
From: Shaohua Li blkcg_bio_issue_check() already gets blkcg for a BIO. bio_associate_blkcg() uses a percpu refcounter, so it's a very cheap operation. There is no point we don't attach the cgroup info into bio at blkcg_bio_issue_check. This also makes blktrace outputs correct cgroup in

[PATCH V5 08/11] blktrace: export cgroup info in trace

2017-07-12 Thread Shaohua Li
From: Shaohua Li Currently blktrace isn't cgroup aware. blktrace prints out task name of current context, but the task of current context isn't always in the cgroup where the BIO comes from. We can't use task name to find out IO cgroup. For example, Writeback BIOs always com

[PATCH V5 07/11] cgroup: export fhandle info for a cgroup

2017-07-12 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inod

[PATCH V5 11/11] block: use standard blktrace API to output cgroup info for debug notes

2017-07-12 Thread Shaohua Li
From: Shaohua Li Currently cfq/bfq/blk-throttle output cgroup info in trace in their own way. Now we have standard blktrace API for this, so convert them to use it. Note, this changes the behavior a little bit. cgroup info isn't output by default, we only do this with 'blk_cgro

[GIT PULL] MD update for 4.13

2017-07-07 Thread Shaohua Li
between mddev_suspend() and md_write_start() md: use a separate bio_set for synchronous IO. Shaohua Li (2): MD: fix a null dereference MD: fix sleep in atomic drivers/md/faulty.c| 5 +++-- drivers/md/linear.c| 7 --- drivers/md/md.c| 47 +

Re: [PATCH v2 11/51] md: raid1: initialize bvec table via bio_add_page()

2017-06-29 Thread Shaohua Li
he standardy way instead of writing the > > > talbe directly. Otherwise it won't work any more once > > > multipage bvec is enabled. > > > > > > Cc: Shaohua Li > > > Cc: linux-r...@vger.kernel.org > > > Signed-off-by: Ming Lei > >

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-29 Thread Shaohua Li
On Wed, Jun 28, 2017 at 02:57:38PM -0600, Jens Axboe wrote: > On 06/28/2017 12:11 PM, Tejun Heo wrote: > > Hello, > > > > On Wed, Jun 28, 2017 at 10:54:28AM -0600, Jens Axboe wrote: > Series looks fine to me. I don't know how you want to split or funnel it, > since it touches multiple di

Re: [PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-29 Thread Shaohua Li
On Thu, Jun 29, 2017 at 07:15:52PM +0200, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 02:42:49PM -0700, Shaohua Li wrote: > > > bio_integrity_endio -> bio_integrity_verify_fn -> bio_integrity_process > > > access the integrity data, so I don't think this work

Re: [PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 11:29:08PM +0200, Christoph Hellwig wrote: > On Wed, Jun 28, 2017 at 09:30:00AM -0700, Shaohua Li wrote: > > From: Shaohua Li > > > > bio_free isn't a good place to free cgroup/integrity info. There are a > > lot of cases bio is allocated

Re: [PATCH] fs: System memory leak when running HTX with T10 DIF enabled

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 12:57:50PM -0600, Jens Axboe wrote: > On 06/28/2017 12:52 PM, Christoph Hellwig wrote: > > On Wed, Jun 28, 2017 at 12:44:00PM -0600, Jens Axboe wrote: > >> On 06/28/2017 12:38 PM, Christoph Hellwig wrote: > >>> On Wed, Jun 28, 2017 at 12:34:15PM -0600, Jens Axboe wrote: > >>

Re: [PATCH V4 00/12] blktrace: output cgroup info

2017-06-28 Thread Shaohua Li
On Wed, Jun 28, 2017 at 10:43:48AM -0600, Jens Axboe wrote: > On 06/28/2017 10:29 AM, Shaohua Li wrote: > > From: Shaohua Li > > > > Hi, > > > > Currently blktrace isn't cgroup aware. blktrace prints out task name of > > current > > context, b

[PATCH V4 01/12] kernfs: use idr instead of ida to manage inode number

2017-06-28 Thread Shaohua Li
From: Shaohua Li kernfs uses ida to manage inode number. The problem is we can't get kernfs_node from inode number with ida. Switching to use idr, next patch will add an API to get kernfs_node from inode number. Acked-by: Tejun Heo Signed-off-by: Shaohua Li --- fs/kernfs/dir.c

[PATCH V4 05/12] kernfs: introduce kernfs_node_id

2017-06-28 Thread Shaohua Li
From: Shaohua Li inode number and generation can identify a kernfs node. We are going to export the identification by exportfs operations, so put ino and generation into a separate structure. It's convenient when later patches use the identification. Signed-off-by: Shaohua Li --- fs/k

[PATCH V4 07/12] cgroup: export fhandle info for a cgroup

2017-06-28 Thread Shaohua Li
From: Shaohua Li Add an API to export cgroup fhandle info. We don't export a full 'struct file_handle', there are unrequired info. Sepcifically, cgroup is always a directory, so we don't need a 'FILEID_INO32_GEN_PARENT' type fhandle, we only need export the inod

[PATCH V4 06/12] kernfs: add exportfs operations

2017-06-28 Thread Shaohua Li
From: Shaohua Li Now we have the facilities to implement exportfs operations. The idea is cgroup can export the fhandle info to userspace, then userspace uses fhandle to find the cgroup name. Another example is userspace can get fhandle for a cgroup and BPF uses the fhandle to filter info for

[PATCH V4 10/12] block: call __bio_free in bio_endio

2017-06-28 Thread Shaohua Li
From: Shaohua Li bio_free isn't a good place to free cgroup/integrity info. There are a lot of cases bio is allocated in special way (for example, in stack) and never gets called by bio_put hence bio_free, we are leaking memory. This patch moves the free to bio endio, which should be c

  1   2   3   4   5   6   7   8   9   >