date:20210412

[syzbot] KASAN: slab-out-of-bounds Read in __xfrm_decode_session (2)

2021-04-12 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:1678e493 Merge tag 'lto-v5.12-rc6' of git://git.kernel.org..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1565bf7cd0
kernel config:  https://syzkaller.appspot.com/x/.config?x=71a75beb62b62a34
dashboard link: https://syzkaller.appspot.com/bug?extid=518a7b845c0083047e9c
compiler:   Debian clang version 11.0.1-2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+518a7b845c0083047...@syzkaller.appspotmail.com

==
BUG: KASAN: slab-out-of-bounds in decode_session6 net/xfrm/xfrm_policy.c:3403 
[inline]
BUG: KASAN: slab-out-of-bounds in __xfrm_decode_session+0x1ba4/0x2720 
net/xfrm/xfrm_policy.c:3495
Read of size 1 at addr 888013104540 by task syz-executor.3/16514

CPU: 0 PID: 16514 Comm: syz-executor.3 Not tainted 5.12.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x176/0x24e lib/dump_stack.c:120
 print_address_description+0x5f/0x3a0 mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report+0x15c/0x200 mm/kasan/report.c:416
 decode_session6 net/xfrm/xfrm_policy.c:3403 [inline]
 __xfrm_decode_session+0x1ba4/0x2720 net/xfrm/xfrm_policy.c:3495
 vti_tunnel_xmit+0x1ea/0x1510 net/ipv4/ip_vti.c:286
 __netdev_start_xmit include/linux/netdevice.h:4825 [inline]
 netdev_start_xmit include/linux/netdevice.h:4839 [inline]
 xmit_one net/core/dev.c:3605 [inline]
 dev_hard_start_xmit+0x20b/0x450 net/core/dev.c:3621
 sch_direct_xmit+0x1f0/0xd30 net/sched/sch_generic.c:313
 qdisc_restart net/sched/sch_generic.c:376 [inline]
 __qdisc_run+0xa4d/0x1a90 net/sched/sch_generic.c:384
 __dev_xmit_skb net/core/dev.c:3855 [inline]
 __dev_queue_xmit+0x1141/0x2a50 net/core/dev.c:4162
 neigh_output include/net/neighbour.h:510 [inline]
 ip6_finish_output2+0x10be/0x1460 net/ipv6/ip6_output.c:117
 dst_output include/net/dst.h:448 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 ndisc_send_skb+0x93b/0xd50 net/ipv6/ndisc.c:508
 addrconf_rs_timer+0x242/0x6f0 net/ipv6/addrconf.c:3877
 call_timer_fn+0x91/0x160 kernel/time/timer.c:1431
 expire_timers kernel/time/timer.c:1476 [inline]
 __run_timers+0x6c0/0x8a0 kernel/time/timer.c:1745
 run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1758
 __do_softirq+0x318/0x714 kernel/softirq.c:345
 invoke_softirq kernel/softirq.c:221 [inline]
 __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:434
 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100
 
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:__sanitizer_cov_trace_pc+0x56/0x60 kernel/kcov.c:205
Code: 2c 8b 91 10 15 00 00 83 fa 02 75 21 48 8b 91 18 15 00 00 48 8b 32 48 8d 
7e 01 8b 89 14 15 00 00 48 39 cf 73 08 48 89 44 f2 08 <48> 89 3a c3 66 0f 1f 44 
00 00 4c 8b 04 24 65 48 8b 14 25 80 ef 01
RSP: 0018:c90001acf9f0 EFLAGS: 0283
RAX: 821506a4 RBX:  RCX: 0004
RDX: c9000f2df000 RSI: 2928 RDI: 2929
RBP: 192000359f57 R08: dc00 R09: f52000359f5e
R10: f52000359f5e R11:  R12: 111029006027
R13: 888034b67020 R14: 192000359f98 R15: 888034b67018
 ext4_match fs/ext4/namei.c:1364 [inline]
 ext4_search_dir+0x2f4/0xa10 fs/ext4/namei.c:1395
 search_dirblock fs/ext4/namei.c:1199 [inline]
 __ext4_find_entry+0x121c/0x1790 fs/ext4/namei.c:1553
 ext4_find_entry fs/ext4/namei.c:1602 [inline]
 ext4_rmdir+0x347/0x1180 fs/ext4/namei.c:3132
 vfs_rmdir+0x20a/0x3f0 fs/namei.c:3899
 ovl_remove_upper fs/overlayfs/dir.c:825 [inline]
 ovl_do_remove+0x4d2/0xbe0 fs/overlayfs/dir.c:904
 vfs_rmdir+0x20a/0x3f0 fs/namei.c:3899
 do_rmdir+0x2a5/0x560 fs/namei.c:3962
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x466459
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:7f08cdd4a188 EFLAGS: 0246 ORIG_RAX: 0054
RAX: ffda RBX: 0056c008 RCX: 00466459
RDX:  RSI:  RDI: 20c0
RBP: 004bf9fb R08:  R09: 
R10:  R11: 0246 R12: 0056c008
R13: 7ffefaa401bf R14: 7f08cdd4a300 R15: 00022000

Allocated by task 8393:
 kasan_save_stack mm/kasan/common.c:38 [inline]
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:427 [inline]
 kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506
 kasan_kmalloc include/linux/kasan.h:233 [inline]

[PATCH v4 2/4] pinctrl: add pinctrl driver on mt8195

2021-04-12 Thread Zhiyong Tao

This commit includes pinctrl driver for mt8195.

Signed-off-by: Zhiyong Tao 
---
 drivers/pinctrl/mediatek/Kconfig  |6 +
 drivers/pinctrl/mediatek/Makefile |1 +
 drivers/pinctrl/mediatek/pinctrl-mt8195.c |  828 
 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h | 1669 +
 4 files changed, 2504 insertions(+)
 create mode 100644 drivers/pinctrl/mediatek/pinctrl-mt8195.c
 create mode 100644 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h

diff --git a/drivers/pinctrl/mediatek/Kconfig b/drivers/pinctrl/mediatek/Kconfig
index eef17f228669..90f0c8255eaf 100644
--- a/drivers/pinctrl/mediatek/Kconfig
+++ b/drivers/pinctrl/mediatek/Kconfig
@@ -147,6 +147,12 @@ config PINCTRL_MT8192
default ARM64 && ARCH_MEDIATEK
select PINCTRL_MTK_PARIS
 
+config PINCTRL_MT8195
+   bool "Mediatek MT8195 pin control"
+   depends on OF
+   depends on ARM64 || COMPILE_TEST
+   select PINCTRL_MTK_PARIS
+
 config PINCTRL_MT8516
bool "Mediatek MT8516 pin control"
depends on OF
diff --git a/drivers/pinctrl/mediatek/Makefile 
b/drivers/pinctrl/mediatek/Makefile
index 01218bf4dc30..06fde993ace2 100644
--- a/drivers/pinctrl/mediatek/Makefile
+++ b/drivers/pinctrl/mediatek/Makefile
@@ -21,5 +21,6 @@ obj-$(CONFIG_PINCTRL_MT8167)  += pinctrl-mt8167.o
 obj-$(CONFIG_PINCTRL_MT8173)   += pinctrl-mt8173.o
 obj-$(CONFIG_PINCTRL_MT8183)   += pinctrl-mt8183.o
 obj-$(CONFIG_PINCTRL_MT8192)   += pinctrl-mt8192.o
+obj-$(CONFIG_PINCTRL_MT8195)+= pinctrl-mt8195.o
 obj-$(CONFIG_PINCTRL_MT8516)   += pinctrl-mt8516.o
 obj-$(CONFIG_PINCTRL_MT6397)   += pinctrl-mt6397.o
diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c 
b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
new file mode 100644
index ..063f164d7c9b
--- /dev/null
+++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
@@ -0,0 +1,828 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 MediaTek Inc.
+ *
+ * Author: Zhiyong Tao 
+ *
+ */
+
+#include "pinctrl-mtk-mt8195.h"
+#include "pinctrl-paris.h"
+
+/* MT8195 have multiple bases to program pin configuration listed as the below:
+ * iocfg[0]:0x10005000, iocfg[1]:0x11d1, iocfg[2]:0x11d3,
+ * iocfg[3]:0x11d4, iocfg[4]:0x11e2, iocfg[5]:0x11eb,
+ * iocfg[6]:0x11f4.
+ * _i_based could be used to indicate what base the pin should be mapped into.
+ */
+
+#define PIN_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits) \
+   PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits, \
+  32, 0)
+
+#define PINS_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits) \
+   PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits,  \
+  32, 1)
+
+static const struct mtk_pin_field_calc mt8195_pin_mode_range[] = {
+   PIN_FIELD(0, 144, 0x300, 0x10, 0, 4),
+};
+
+static const struct mtk_pin_field_calc mt8195_pin_dir_range[] = {
+   PIN_FIELD(0, 144, 0x0, 0x10, 0, 1),
+};
+
+static const struct mtk_pin_field_calc mt8195_pin_di_range[] = {
+   PIN_FIELD(0, 144, 0x200, 0x10, 0, 1),
+};
+
+static const struct mtk_pin_field_calc mt8195_pin_do_range[] = {
+   PIN_FIELD(0, 144, 0x100, 0x10, 0, 1),
+};
+
+static const struct mtk_pin_field_calc mt8195_pin_ies_range[] = {
+   PIN_FIELD_BASE(0, 0, 4, 0x040, 0x10, 0, 1),
+   PIN_FIELD_BASE(1, 1, 4, 0x040, 0x10, 1, 1),
+   PIN_FIELD_BASE(2, 2, 4, 0x040, 0x10, 2, 1),
+   PIN_FIELD_BASE(3, 3, 4, 0x040, 0x10, 3, 1),
+   PIN_FIELD_BASE(4, 4, 4, 0x040, 0x10, 4, 1),
+   PIN_FIELD_BASE(5, 5, 4, 0x040, 0x10, 5, 1),
+   PIN_FIELD_BASE(6, 6, 4, 0x040, 0x10, 6, 1),
+   PIN_FIELD_BASE(7, 7, 4, 0x040, 0x10, 7, 1),
+   PIN_FIELD_BASE(8, 8, 4, 0x040, 0x10, 13, 1),
+   PIN_FIELD_BASE(9, 9, 4, 0x040, 0x10, 8, 1),
+   PIN_FIELD_BASE(10, 10, 4, 0x040, 0x10, 14, 1),
+   PIN_FIELD_BASE(11, 11, 4, 0x040, 0x10, 9, 1),
+   PIN_FIELD_BASE(12, 12, 4, 0x040, 0x10, 15, 1),
+   PIN_FIELD_BASE(13, 13, 4, 0x040, 0x10, 10, 1),
+   PIN_FIELD_BASE(14, 14, 4, 0x040, 0x10, 16, 1),
+   PIN_FIELD_BASE(15, 15, 4, 0x040, 0x10, 11, 1),
+   PIN_FIELD_BASE(16, 16, 4, 0x040, 0x10, 17, 1),
+   PIN_FIELD_BASE(17, 17, 4, 0x040, 0x10, 12, 1),
+   PIN_FIELD_BASE(18, 18, 2, 0x040, 0x10, 5, 1),
+   PIN_FIELD_BASE(19, 19, 2, 0x040, 0x10, 12, 1),
+   PIN_FIELD_BASE(20, 20, 2, 0x040, 0x10, 11, 1),
+   PIN_FIELD_BASE(21, 21, 2, 0x040, 0x10, 10, 1),
+   PIN_FIELD_BASE(22, 22, 2, 0x040, 0x10, 0, 1),
+   PIN_FIELD_BASE(23, 23, 2, 0x040, 0x10, 1, 1),
+   PIN_FIELD_BASE(24, 24, 2, 0x040, 0x10, 2, 1),
+   PIN_FIELD_BASE(25, 25, 2, 0x040, 0x10, 4, 1),
+   PIN_FIELD_BASE(26, 26, 2, 0x040, 0x10, 3, 1),
+   PIN_FIELD_BASE(27, 27, 2, 0x040, 0x10, 6, 1),
+   PIN_FIELD_BASE(28, 28, 2, 0x040, 0x10, 7, 1),
+   PIN_FIELD_BASE(29, 29, 2, 0x040, 0x10, 8, 1),
+   PIN_FIELD_BASE(30, 30, 2, 0x040, 0x10, 9, 1),
+

[PATCH v4 4/4] pinctrl: add rsel setting on MT8195

2021-04-12 Thread Zhiyong Tao

This patch provides rsel setting on MT8195.

Signed-off-by: Zhiyong Tao 
---
 drivers/pinctrl/mediatek/pinctrl-mt8195.c | 22 +++
 .../pinctrl/mediatek/pinctrl-mtk-common-v2.c  | 14 
 .../pinctrl/mediatek/pinctrl-mtk-common-v2.h  | 10 +
 drivers/pinctrl/mediatek/pinctrl-paris.c  | 16 ++
 4 files changed, 62 insertions(+)

diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c 
b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
index a7500e18bb1d..66608b8d346a 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mt8195.c
+++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
@@ -779,6 +779,25 @@ static const struct mtk_pin_field_calc 
mt8195_pin_drv_adv_range[] = {
PIN_FIELD_BASE(45, 45, 1, 0x040, 0x10, 9, 3),
 };
 
+static const struct mtk_pin_field_calc mt8195_pin_rsel_range[] = {
+   PIN_FIELD_BASE(8, 8, 4, 0x0c0, 0x10, 15, 3),
+   PIN_FIELD_BASE(9, 9, 4, 0x0c0, 0x10, 0, 3),
+   PIN_FIELD_BASE(10, 10, 4, 0x0c0, 0x10, 18, 3),
+   PIN_FIELD_BASE(11, 11, 4, 0x0c0, 0x10, 3, 3),
+   PIN_FIELD_BASE(12, 12, 4, 0x0c0, 0x10, 21, 3),
+   PIN_FIELD_BASE(13, 13, 4, 0x0c0, 0x10, 6, 3),
+   PIN_FIELD_BASE(14, 14, 4, 0x0c0, 0x10, 24, 3),
+   PIN_FIELD_BASE(15, 15, 4, 0x0c0, 0x10, 9, 3),
+   PIN_FIELD_BASE(16, 16, 4, 0x0c0, 0x10, 27, 3),
+   PIN_FIELD_BASE(17, 17, 4, 0x0c0, 0x10, 12, 3),
+   PIN_FIELD_BASE(29, 29, 2, 0x080, 0x10, 0, 3),
+   PIN_FIELD_BASE(30, 30, 2, 0x080, 0x10, 3, 3),
+   PIN_FIELD_BASE(34, 34, 1, 0x0e0, 0x10, 0, 3),
+   PIN_FIELD_BASE(35, 35, 1, 0x0e0, 0x10, 3, 3),
+   PIN_FIELD_BASE(44, 44, 1, 0x0e0, 0x10, 6, 3),
+   PIN_FIELD_BASE(45, 45, 1, 0x0e0, 0x10, 9, 3),
+};
+
 static const struct mtk_pin_reg_calc mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = {
[PINCTRL_PIN_REG_MODE] = MTK_RANGE(mt8195_pin_mode_range),
[PINCTRL_PIN_REG_DIR] = MTK_RANGE(mt8195_pin_dir_range),
@@ -793,6 +812,7 @@ static const struct mtk_pin_reg_calc 
mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = {
[PINCTRL_PIN_REG_R0] = MTK_RANGE(mt8195_pin_r0_range),
[PINCTRL_PIN_REG_R1] = MTK_RANGE(mt8195_pin_r1_range),
[PINCTRL_PIN_REG_DRV_ADV] = MTK_RANGE(mt8195_pin_drv_adv_range),
+   [PINCTRL_PIN_REG_RSEL] = MTK_RANGE(mt8195_pin_rsel_range),
 };
 
 static const char * const mt8195_pinctrl_register_base_names[] = {
@@ -823,6 +843,8 @@ static const struct mtk_pin_soc mt8195_data = {
.drive_get = mtk_pinconf_drive_get_rev1,
.adv_drive_get = mtk_pinconf_adv_drive_get_raw,
.adv_drive_set = mtk_pinconf_adv_drive_set_raw,
+   .rsel_set = mtk_pinconf_rsel_set,
+   .rsel_get = mtk_pinconf_rsel_get,
 };
 
 static const struct of_device_id mt8195_pinctrl_of_match[] = {
diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c 
b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
index 2b51f4a9b860..d1526d0c6248 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
@@ -1041,6 +1041,20 @@ int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *hw,
 }
 EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get_raw);
 
+int mtk_pinconf_rsel_set(struct mtk_pinctrl *hw,
+const struct mtk_pin_desc *desc, u32 arg)
+{
+   return mtk_hw_set_value(hw, desc, PINCTRL_PIN_REG_RSEL, arg);
+}
+EXPORT_SYMBOL_GPL(mtk_pinconf_rsel_set);
+
+int mtk_pinconf_rsel_get(struct mtk_pinctrl *hw,
+const struct mtk_pin_desc *desc, u32 *val)
+{
+   return mtk_hw_get_value(hw, desc, PINCTRL_PIN_REG_RSEL, val);
+}
+EXPORT_SYMBOL_GPL(mtk_pinconf_rsel_get);
+
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Sean Wang ");
 MODULE_DESCRIPTION("Pin configuration library module for mediatek SoCs");
diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h 
b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
index fd5ce9c5dcbd..570e8da7bf38 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
@@ -67,6 +67,7 @@ enum {
PINCTRL_PIN_REG_DRV_E0,
PINCTRL_PIN_REG_DRV_E1,
PINCTRL_PIN_REG_DRV_ADV,
+   PINCTRL_PIN_REG_RSEL,
PINCTRL_PIN_REG_MAX,
 };
 
@@ -237,6 +238,10 @@ struct mtk_pin_soc {
 const struct mtk_pin_desc *desc, u32 arg);
int (*adv_drive_get)(struct mtk_pinctrl *hw,
 const struct mtk_pin_desc *desc, u32 *val);
+   int (*rsel_set)(struct mtk_pinctrl *hw,
+   const struct mtk_pin_desc *desc, u32 arg);
+   int (*rsel_get)(struct mtk_pinctrl *hw,
+   const struct mtk_pin_desc *desc, u32 *val);
 
/* Specific driver data */
void*driver_data;
@@ -320,5 +325,10 @@ int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw,
 int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *hw,
  const struct mtk_pin_desc *desc, u32 *val);
 
+int

[PATCH v4 1/4] dt-bindings: pinctrl: mt8195: add pinctrl file and binding document

2021-04-12 Thread Zhiyong Tao

1. This patch adds pinctrl file for mt8195.
2. This patch adds mt8195 compatible node in binding document.

Signed-off-by: Zhiyong Tao 
---
 .../bindings/pinctrl/pinctrl-mt8195.yaml  | 151 +++
 include/dt-bindings/pinctrl/mt8195-pinfunc.h  | 962 ++
 2 files changed, 1113 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/pinctrl/pinctrl-mt8195.yaml
 create mode 100644 include/dt-bindings/pinctrl/mt8195-pinfunc.h

diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-mt8195.yaml 
b/Documentation/devicetree/bindings/pinctrl/pinctrl-mt8195.yaml
new file mode 100644
index ..2f12ec59eee5
--- /dev/null
+++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-mt8195.yaml
@@ -0,0 +1,151 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/pinctrl/pinctrl-mt8195.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Mediatek MT8195 Pin Controller
+
+maintainers:
+  - Sean Wang 
+
+description: |
+  The Mediatek's Pin controller is used to control SoC pins.
+
+properties:
+  compatible:
+const: mediatek,mt8195-pinctrl
+
+  gpio-controller: true
+
+  '#gpio-cells':
+description: |
+  Number of cells in GPIO specifier. Since the generic GPIO binding is 
used,
+  the amount of cells must be specified as 2. See the below
+  mentioned gpio binding representation for description of particular 
cells.
+const: 2
+
+  gpio-ranges:
+description: gpio valid number range.
+maxItems: 1
+
+  reg:
+description: |
+  Physical address base for gpio base registers. There are 8 GPIO
+  physical address base in mt8195.
+maxItems: 8
+
+  reg-names:
+description: |
+  Gpio base register names.
+maxItems: 8
+
+  interrupt-controller: true
+
+  '#interrupt-cells':
+const: 2
+
+  interrupts:
+description: The interrupt outputs to sysirq.
+maxItems: 1
+
+#PIN CONFIGURATION NODES
+patternProperties:
+  '-pins$':
+type: object
+description: |
+  A pinctrl node should contain at least one subnodes representing the
+  pinctrl groups available on the machine. Each subnode will list the
+  pins it needs, and how they should be configured, with regard to muxer
+  configuration, pullups, drive strength, input enable/disable and
+  input schmitt.
+  An example of using macro:
+  pincontroller {
+/* GPIO0 set as multifunction GPIO0 */
+gpio_pin {
+  pinmux = ;
+};
+/* GPIO8 set as multifunction SDA0 */
+i2c0_pin {
+  pinmux = ;
+};
+  };
+$ref: "pinmux-node.yaml"
+
+properties:
+  pinmux:
+description: |
+  Integer array, represents gpio pin number and mux setting.
+  Supported pin number and mux varies for different SoCs, and are 
defined
+  as macros in dt-bindings/pinctrl/-pinfunc.h directly.
+
+  drive-strength:
+description: |
+  It can support some arguments which is from 0 to 7. It can only 
support
+  2/4/6/8/10/12/14/16mA in mt8195.
+enum: [0, 1, 2, 3, 4, 5, 6, 7]
+
+  bias-pull-down: true
+
+  bias-pull-up: true
+
+  bias-disable: true
+
+  output-high: true
+
+  output-low: true
+
+  input-enable: true
+
+  input-disable: true
+
+  input-schmitt-enable: true
+
+  input-schmitt-disable: true
+
+required:
+  - pinmux
+
+additionalProperties: false
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - interrupt-controller
+  - '#interrupt-cells'
+  - gpio-controller
+  - '#gpio-cells'
+  - gpio-ranges
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+pio: pinctrl@10005000 {
+compatible = "mediatek,mt8195-pinctrl";
+reg = <0x10005000 0x1000>,
+  <0x11d1 0x1000>,
+  <0x11d3 0x1000>,
+  <0x11d4 0x1000>,
+  <0x11e2 0x1000>,
+  <0x11eb 0x1000>,
+  <0x11f4 0x1000>,
+  <0x1000b000 0x1000>;
+reg-names = "iocfg0", "iocfg_bm", "iocfg_bl",
+  "iocfg_br", "iocfg_lm", "iocfg_rb",
+  "iocfg_tl", "eint";
+gpio-controller;
+#gpio-cells = <2>;
+gpio-ranges = < 0 0 144>;
+interrupt-controller;
+interrupts = ;
+#interrupt-cells = <2>;
+
+pio-pins {
+  pinmux = ;
+  output-low;
+};
+};
diff --git a/include/dt-bindings/pinctrl/mt8195-pinfunc.h 
b/include/dt-bindings/pinctrl/mt8195-pinfunc.h
new file mode 100644
index ..666331bb9b40
--- /dev/null
+++

[PATCH v4 0/4] Mediatek pinctrl patch on mt8195

2021-04-12 Thread Zhiyong Tao

This series includes 4 patches:
1.add pinctrl file and inding document on mt8195.
2.add pinctrl driver on MT8195.
3.add pinctrl drive for I2C related pins on MT8195.
4.add pinctrl rsel setting on MT8195.

Changes in patch v4:
1)fix pinctrl-mt8195.yaml warning error.
2)remove pinctrl device node patch which is based on "mt8195.dtsi".

Changes in patch v3:
1)change '^pins' to '-pins$'.
2)change 'state_0_node_a' to 'gpio_pin' which is defined in dts.
3)change 'state_0_node_b' to 'i2c0_pin' which is defined in dts.
4)reorder this series patches. change pinctrl file and binding document
together in one patch.

There are no changes in v1 & v2.

Zhiyong Tao (4):
  dt-bindings: pinctrl: mt8195: add pinctrl file and binding document
  pinctrl: add pinctrl driver on mt8195
  pinctrl: add drive for I2C related pins on MT8195
  pinctrl: add rsel setting on MT8195

 .../bindings/pinctrl/pinctrl-mt8195.yaml  |  151 ++
 drivers/pinctrl/mediatek/Kconfig  |6 +
 drivers/pinctrl/mediatek/Makefile |1 +
 drivers/pinctrl/mediatek/pinctrl-mt8195.c |  872 +
 .../pinctrl/mediatek/pinctrl-mtk-common-v2.c  |   28 +
 .../pinctrl/mediatek/pinctrl-mtk-common-v2.h  |   15 +
 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h | 1669 +
 drivers/pinctrl/mediatek/pinctrl-paris.c  |   16 +
 include/dt-bindings/pinctrl/mt8195-pinfunc.h  |  962 ++
 9 files changed, 3720 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/pinctrl/pinctrl-mt8195.yaml
 create mode 100644 drivers/pinctrl/mediatek/pinctrl-mt8195.c
 create mode 100644 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h
 create mode 100644 include/dt-bindings/pinctrl/mt8195-pinfunc.h

--
2.18.0

[syzbot] BUG: unable to handle kernel NULL pointer dereference in __lookup_slow (2)

2021-04-12 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:d93a0d43 Merge tag 'block-5.12-2021-04-02' of git://git.ke..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16519431d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=71a75beb62b62a34
dashboard link: https://syzkaller.appspot.com/bug?extid=11c49ce9d4e7896f3406
compiler:   Debian clang version 11.0.1-2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+11c49ce9d4e7896f3...@syzkaller.appspotmail.com

REISERFS (device loop4): Using r5 hash to sort names
BUG: kernel NULL pointer dereference, address: 
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 6bb82067 P4D 6bb82067 PUD 6bb81067 PMD 0 
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 11072 Comm: syz-executor.4 Not tainted 5.12.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffd6.
RSP: 0018:c90008f8fa20 EFLAGS: 00010246
RAX: 113872e8 RBX: dc00 RCX: 0004
RDX:  RSI: 88802e9d9490 RDI: 88807f140190
RBP: 89c39740 R08: 81c9d4de R09: fbfff200a946
R10: fbfff200a946 R11:  R12: 
R13: 88807f140190 R14: 111005d3b292 R15: 88802e9d9490
FS:  7f894af88700() GS:8880b9c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ffd6 CR3: 6bb83000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 __lookup_slow+0x240/0x370 fs/namei.c:1626
 lookup_one_len+0x10e/0x200 fs/namei.c:2649
 reiserfs_lookup_privroot+0x85/0x1e0 fs/reiserfs/xattr.c:980
 reiserfs_fill_super+0x2a69/0x3160 fs/reiserfs/super.c:2176
 mount_bdev+0x26c/0x3a0 fs/super.c:1367
 legacy_get_tree+0xea/0x180 fs/fs_context.c:592
 vfs_get_tree+0x86/0x270 fs/super.c:1497
 do_new_mount fs/namespace.c:2903 [inline]
 path_mount+0x188a/0x29a0 fs/namespace.c:3233
 do_mount fs/namespace.c:3246 [inline]
 __do_sys_mount fs/namespace.c:3454 [inline]
 __se_sys_mount+0x28c/0x320 fs/namespace.c:3431
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x46797a
Code: 48 c7 c2 bc ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 b8 04 00 00 
0f 1f 84 00 00 00 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 
c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:7f894af87fa8 EFLAGS: 0206 ORIG_RAX: 00a5
RAX: ffda RBX: 2200 RCX: 0046797a
RDX: 2000 RSI: 2100 RDI: 7f894af88000
RBP: 7f894af88040 R08: 7f894af88040 R09: 2000
R10:  R11: 0206 R12: 2000
R13: 2100 R14: 7f894af88000 R15: 20011500
Modules linked in:
CR2: 
---[ end trace a1b8dbb111baf993 ]---
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffd6.
RSP: 0018:c90008f8fa20 EFLAGS: 00010246
RAX: 113872e8 RBX: dc00 RCX: 0004
RDX:  RSI: 88802e9d9490 RDI: 88807f140190
RBP: 89c39740 R08: 81c9d4de R09: fbfff200a946
R10: fbfff200a946 R11:  R12: 
R13: 88807f140190 R14: 111005d3b292 R15: 88802e9d9490
FS:  7f894af88700() GS:8880b9c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: ffd6 CR3: 6bb83000 CR4: 001506f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

[syzbot] KASAN: slab-out-of-bounds Read in reiserfs_xattr_get

2021-04-12 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:3a229812 Merge tag 'arm-fixes-5.11-2' of git://git.kernel...
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16b4d196d0
kernel config:  https://syzkaller.appspot.com/x/.config?x=f91155ccddaf919c
dashboard link: https://syzkaller.appspot.com/bug?extid=72ba979b6681c3369db4
compiler:   Debian clang version 11.0.1-2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+72ba979b6681c3369...@syzkaller.appspotmail.com

loop3: detected capacity change from 0 to 65534
==
BUG: KASAN: slab-out-of-bounds in reiserfs_xattr_get+0xe0/0x590 
fs/reiserfs/xattr.c:681
Read of size 8 at addr 888028983198 by task syz-executor.3/4211

CPU: 1 PID: 4211 Comm: syz-executor.3 Not tainted 5.12.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x176/0x24e lib/dump_stack.c:120
 print_address_description+0x5f/0x3a0 mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report+0x15c/0x200 mm/kasan/report.c:416
 reiserfs_xattr_get+0xe0/0x590 fs/reiserfs/xattr.c:681
 reiserfs_get_acl+0x63/0x670 fs/reiserfs/xattr_acl.c:211
 get_acl+0x152/0x2e0 fs/posix_acl.c:141
 check_acl fs/namei.c:294 [inline]
 acl_permission_check fs/namei.c:339 [inline]
 generic_permission+0x2ed/0x5b0 fs/namei.c:392
 do_inode_permission fs/namei.c:446 [inline]
 inode_permission+0x28e/0x500 fs/namei.c:513
 may_open+0x228/0x3e0 fs/namei.c:2985
 do_open fs/namei.c:3365 [inline]
 path_openat+0x2697/0x3860 fs/namei.c:3500
 do_filp_open+0x1a3/0x3b0 fs/namei.c:3527
 do_sys_openat2+0xba/0x380 fs/open.c:1187
 do_sys_open fs/open.c:1203 [inline]
 __do_sys_openat fs/open.c:1219 [inline]
 __se_sys_openat fs/open.c:1214 [inline]
 __x64_sys_openat+0x1c8/0x1f0 fs/open.c:1214
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x419544
Code: 84 00 00 00 00 00 44 89 54 24 0c e8 96 f9 ff ff 44 8b 54 24 0c 44 89 e2 
48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 34 
44 89 c7 89 44 24 0c e8 c8 f9 ff ff 8b 44
RSP: 002b:7fa357a03f30 EFLAGS: 0293 ORIG_RAX: 0101
RAX: ffda RBX: 2200 RCX: 00419544
RDX: 0001 RSI: 2100 RDI: ff9c
RBP: 2100 R08:  R09: 2000
R10:  R11: 0293 R12: 0001
R13: 2100 R14: 7fa357a04000 R15: 20065600

Allocated by task 4210:
 kasan_save_stack mm/kasan/common.c:38 [inline]
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:427 [inline]
 kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506
 kasan_kmalloc include/linux/kasan.h:233 [inline]
 kmem_cache_alloc_trace+0x21b/0x350 mm/slub.c:2934
 kmalloc include/linux/slab.h:554 [inline]
 kzalloc include/linux/slab.h:684 [inline]
 smk_fetch security/smack/smack_lsm.c:288 [inline]
 smack_d_instantiate+0x65c/0xcc0 security/smack/smack_lsm.c:3411
 security_d_instantiate+0xa5/0x100 security/security.c:1987
 d_instantiate_new+0x61/0x110 fs/dcache.c:2025
 ext4_add_nondir+0x22b/0x290 fs/ext4/namei.c:2590
 ext4_symlink+0x8ce/0xe90 fs/ext4/namei.c:3417
 vfs_symlink+0x3a0/0x540 fs/namei.c:4178
 do_symlinkat+0x1c9/0x440 fs/namei.c:4208
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 4210:
 kasan_save_stack mm/kasan/common.c:38 [inline]
 kasan_set_track+0x3d/0x70 mm/kasan/common.c:46
 kasan_set_free_info+0x1f/0x40 mm/kasan/generic.c:357
 kasan_slab_free+0x100/0x140 mm/kasan/common.c:360
 kasan_slab_free include/linux/kasan.h:199 [inline]
 slab_free_hook mm/slub.c:1562 [inline]
 slab_free_freelist_hook+0x171/0x270 mm/slub.c:1600
 slab_free mm/slub.c:3161 [inline]
 kfree+0xcf/0x2d0 mm/slub.c:4213
 smk_fetch security/smack/smack_lsm.c:300 [inline]
 smack_d_instantiate+0x6db/0xcc0 security/smack/smack_lsm.c:3411
 security_d_instantiate+0xa5/0x100 security/security.c:1987
 d_instantiate_new+0x61/0x110 fs/dcache.c:2025
 ext4_add_nondir+0x22b/0x290 fs/ext4/namei.c:2590
 ext4_symlink+0x8ce/0xe90 fs/ext4/namei.c:3417
 vfs_symlink+0x3a0/0x540 fs/namei.c:4178
 do_symlinkat+0x1c9/0x440 fs/namei.c:4208
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Last potentially related work creation:
 kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
 kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
 __call_rcu kernel/rcu/tree.c:3039 [inline]
 call_rcu+0x130/0x8e0 kernel/rcu/tree.c:3114
 fib6_info_release include/net/ip6_fib.h:337 [inline]
 nsim_rt6_release drivers/net/netdevsim/fib.c:507 [inline]
 nsim_fib6_event_fini+0x100/0x1f0

Re: [PATCH][next] scsi: ufs: Fix out-of-bounds warnings in ufshcd_exec_raw_upiu_cmd

2021-04-12 Thread Martin K. Petersen

On Wed, 31 Mar 2021 17:43:38 -0500, Gustavo A. R. Silva wrote:

> Fix the following out-of-bounds warnings by enclosing
> some structure members into new structure objects upiu_req
> and upiu_rsp:
> 
> include/linux/fortify-string.h:20:29: warning: '__builtin_memcpy' offset [29, 
> 48] from the object at 'treq' is out of the bounds of referenced subobject 
> 'req_header' with type 'struct utp_upiu_header' at offset 16 [-Warray-bounds]
> include/linux/fortify-string.h:20:29: warning: '__builtin_memcpy' offset [61, 
> 80] from the object at 'treq' is out of the bounds of referenced subobject 
> 'rsp_header' with type 'struct utp_upiu_header' at offset 48 [-Warray-bounds]
> arch/m68k/include/asm/string.h:72:25: warning: '__builtin_memcpy' offset [29, 
> 48] from the object at 'treq' is out of the bounds of referenced subobject 
> 'req_header' with type 'struct utp_upiu_header' at offset 16 [-Warray-bounds]
> arch/m68k/include/asm/string.h:72:25: warning: '__builtin_memcpy' offset [61, 
> 80] from the object at 'treq' is out of the bounds of referenced subobject 
> 'rsp_header' with type 'struct utp_upiu_header' at offset 48 [-Warray-bounds]
> 
> [...]

Applied to 5.13/scsi-queue, thanks!

[1/1] scsi: ufs: Fix out-of-bounds warnings in ufshcd_exec_raw_upiu_cmd
  https://git.kernel.org/mkp/scsi/c/1352eec8c0da

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH -next] scsi: fnic: remove unnecessary spin_lock_init() and INIT_LIST_HEAD()

2021-04-12 Thread Martin K. Petersen

On Tue, 30 Mar 2021 20:59:11 +0800, Yang Yingliang wrote:

> The spinlock and list head of fnic_list is initialized statically.
> It is unnecessary to initialize by spin_lock_init() and INIT_LIST_HEAD().

Applied to 5.13/scsi-queue, thanks!

[1/1] scsi: fnic: remove unnecessary spin_lock_init() and INIT_LIST_HEAD()
  https://git.kernel.org/mkp/scsi/c/aa6f2fccd711

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] message/fusion: Use BUG_ON instead of if condition followed by BUG.

2021-04-12 Thread Martin K. Petersen

On Tue, 30 Mar 2021 05:46:01 -0700, zhouchuangao wrote:

> BUG_ON() uses unlikely in if(), which can be optimized at compile time.

Applied to 5.13/scsi-queue, thanks!

[1/1] message/fusion: Use BUG_ON instead of if condition followed by BUG.
  https://git.kernel.org/mkp/scsi/c/4dec8004de29

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: bfa: Remove unnecessary struct declaration

2021-04-12 Thread Martin K. Petersen

On Thu, 1 Apr 2021 14:35:34 +0800, Wan Jiabing wrote:

> struct bfa_fcs_s is declared twice. One is declared
> at 50th line. Remove the duplicate.
> struct bfa_fcs_fabric_s is defined at 175th line.
> Remove unnecessary declaration.

Applied to 5.13/scsi-queue, thanks!

[1/1] scsi: bfa: Remove unnecessary struct declaration
  https://git.kernel.org/mkp/scsi/c/c3b0d087763f

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v7 3/4] spmi: mediatek: Add support for MT6873/8192

2021-04-12 Thread Hsin-hsiung Wang

Hi Maintainers,
Gentle pin for this patch.

Thanks.

On Sun, 2021-03-14 at 02:00 +0800, Hsin-Hsiung Wang wrote:
> Add spmi support for MT6873/8192.
> 
> Signed-off-by: Hsin-Hsiung Wang 
> ---
> changes since v6:
> - remove unused spinlock.
> - remove redundant check for slave id.
> ---
>  drivers/spmi/Kconfig |  10 +
>  drivers/spmi/Makefile|   2 +
>  drivers/spmi/spmi-mtk-pmif.c | 465 +++
>  3 files changed, 477 insertions(+)
>  create mode 100644 drivers/spmi/spmi-mtk-pmif.c
> 
> diff --git a/drivers/spmi/Kconfig b/drivers/spmi/Kconfig
> index a53bad541f1a..692bac98a120 100644
> --- a/drivers/spmi/Kconfig
> +++ b/drivers/spmi/Kconfig
> @@ -25,4 +25,14 @@ config SPMI_MSM_PMIC_ARB
> This is required for communicating with Qualcomm PMICs and
> other devices that have the SPMI interface.
>  
> +config SPMI_MTK_PMIF
> + tristate "Mediatek SPMI Controller (PMIC Arbiter)"
> + help
> +   If you say yes to this option, support will be included for the
> +   built-in SPMI PMIC Arbiter interface on Mediatek family
> +   processors.
> +
> +   This is required for communicating with Mediatek PMICs and
> +   other devices that have the SPMI interface.
> +
>  endif
> diff --git a/drivers/spmi/Makefile b/drivers/spmi/Makefile
> index 55a94cadeffe..76fb3b3ab510 100644
> --- a/drivers/spmi/Makefile
> +++ b/drivers/spmi/Makefile
> @@ -5,3 +5,5 @@
>  obj-$(CONFIG_SPMI)   += spmi.o
>  
>  obj-$(CONFIG_SPMI_MSM_PMIC_ARB)  += spmi-pmic-arb.o
> +obj-$(CONFIG_SPMI_MTK_PMIF)  += spmi-mtk-pmif.o
> +
> diff --git a/drivers/spmi/spmi-mtk-pmif.c b/drivers/spmi/spmi-mtk-pmif.c
> new file mode 100644
> index ..94c45d46ab0c
> --- /dev/null
> +++ b/drivers/spmi/spmi-mtk-pmif.c
> @@ -0,0 +1,465 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright (c) 2021 MediaTek Inc.
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define SWINF_IDLE   0x00
> +#define SWINF_WFVLDCLR   0x06
> +
> +#define GET_SWINF(x) (((x) >> 1) & 0x7)
> +
> +#define PMIF_CMD_REG_0   0
> +#define PMIF_CMD_REG 1
> +#define PMIF_CMD_EXT_REG 2
> +#define PMIF_CMD_EXT_REG_LONG3
> +
> +#define PMIF_DELAY_US   10
> +#define PMIF_TIMEOUT_US (10 * 1000)
> +
> +#define PMIF_CHAN_OFFSET 0x5
> +
> +#define PMIF_MAX_CLKS3
> +
> +#define SPMI_OP_ST_BUSY 1
> +
> +struct ch_reg {
> + u32 ch_sta;
> + u32 wdata;
> + u32 rdata;
> + u32 ch_send;
> + u32 ch_rdy;
> +};
> +
> +struct pmif_data {
> + const u32   *regs;
> + const u32   *spmimst_regs;
> + u32 soc_chan;
> +};
> +
> +struct pmif {
> + void __iomem*base;
> + void __iomem*spmimst_base;
> + struct ch_reg   chan;
> + struct clk_bulk_data clks[PMIF_MAX_CLKS];
> + u32 nclks;
> + const struct pmif_data *data;
> +};
> +
> +static const char * const pmif_clock_names[] = {
> + "pmif_sys_ck", "pmif_tmr_ck", "spmimst_clk_mux",
> +};
> +
> +enum pmif_regs {
> + PMIF_INIT_DONE,
> + PMIF_INF_EN,
> + PMIF_ARB_EN,
> + PMIF_CMDISSUE_EN,
> + PMIF_TIMER_CTRL,
> + PMIF_SPI_MODE_CTRL,
> + PMIF_IRQ_EVENT_EN_0,
> + PMIF_IRQ_FLAG_0,
> + PMIF_IRQ_CLR_0,
> + PMIF_IRQ_EVENT_EN_1,
> + PMIF_IRQ_FLAG_1,
> + PMIF_IRQ_CLR_1,
> + PMIF_IRQ_EVENT_EN_2,
> + PMIF_IRQ_FLAG_2,
> + PMIF_IRQ_CLR_2,
> + PMIF_IRQ_EVENT_EN_3,
> + PMIF_IRQ_FLAG_3,
> + PMIF_IRQ_CLR_3,
> + PMIF_IRQ_EVENT_EN_4,
> + PMIF_IRQ_FLAG_4,
> + PMIF_IRQ_CLR_4,
> + PMIF_WDT_EVENT_EN_0,
> + PMIF_WDT_FLAG_0,
> + PMIF_WDT_EVENT_EN_1,
> + PMIF_WDT_FLAG_1,
> + PMIF_SWINF_0_STA,
> + PMIF_SWINF_0_WDATA_31_0,
> + PMIF_SWINF_0_RDATA_31_0,
> + PMIF_SWINF_0_ACC,
> + PMIF_SWINF_0_VLD_CLR,
> + PMIF_SWINF_1_STA,
> + PMIF_SWINF_1_WDATA_31_0,
> + PMIF_SWINF_1_RDATA_31_0,
> + PMIF_SWINF_1_ACC,
> + PMIF_SWINF_1_VLD_CLR,
> + PMIF_SWINF_2_STA,
> + PMIF_SWINF_2_WDATA_31_0,
> + PMIF_SWINF_2_RDATA_31_0,
> + PMIF_SWINF_2_ACC,
> + PMIF_SWINF_2_VLD_CLR,
> + PMIF_SWINF_3_STA,
> + PMIF_SWINF_3_WDATA_31_0,
> + PMIF_SWINF_3_RDATA_31_0,
> + PMIF_SWINF_3_ACC,
> + PMIF_SWINF_3_VLD_CLR,
> +};
> +
> +static const u32 mt6873_regs[] = {
> + [PMIF_INIT_DONE] =  0x,
> + [PMIF_INF_EN] = 0x0024,
> + [PMIF_ARB_EN] = 0x0150,
> + [PMIF_CMDISSUE_EN] =0x03B4,
> + [PMIF_TIMER_CTRL] = 0x03E0,
> + [PMIF_SPI_MODE_CTRL] =  0x0400,
> + [PMIF_IRQ_EVENT_EN_0] = 0x0418,
> + [PMIF_IRQ_FLAG_0] = 0x0420,
> + [PMIF_IRQ_CLR_0] =  0x0424,
> + [PMIF_IRQ_EVENT_EN_1] = 0x0428,
> + [PMIF_IRQ_FLAG_1] = 0x0430,
> + [PMIF_IRQ_CLR_1] =  0x0434,
> + [PMIF_IRQ_EVENT_EN_2] = 0x0438,
> + [PMIF_IRQ_FLAG_2] = 0x0440,
> + [PMIF_IRQ_CLR_2] =  0x0444,
> + [PMIF_IRQ_EVENT_EN_3] = 0x0448,
> + [PMIF_IRQ_FLAG_3] =

[PATCH RFC v2 0/4] virtio net: spurious interrupt related fixes

2021-04-12 Thread Michael S. Tsirkin

With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this introduces a race where tx complete
interrupt has been raised, but the handler finds there is no work to do
because we have done the work in the previous rx interrupt handler.
A similar issue exists with polling from start_xmit, it is however
less common because of the delayed cb optimization of the split ring -
but will likely affect the packed ring once that is more common.

In particular, this was reported to lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  
[ 3588.017989] handlers:
[ 3588.020374] [<1b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patchset attempts to fix this by cleaning up a bunch of races
related to the handling of sq callbacks (aka tx interrupts).
Very lightly tested, sending out for help with testing, early feedback
and flames. Thanks!

Michael S. Tsirkin (4):
  virtio: fix up virtio_disable_cb
  virtio_net: disable cb aggressively
  virtio_net: move tx vq operation under tx queue lock
  virtio_net: move txq wakeups under tx q lock

 drivers/net/virtio_net.c | 35 +--
 drivers/virtio/virtio_ring.c | 26 +-
 2 files changed, 54 insertions(+), 7 deletions(-)

-- 
MST

Re: [PATCH v1 0/2] scsi: libsas: few clean up patches

2021-04-12 Thread Martin K. Petersen

On Thu, 25 Mar 2021 20:29:54 +0800, Luo Jiaxing wrote:

> Two types of errors are detected by the checkpatch.
> 1. Alignment between switches and cases
> 2. Improper use of some spaces
> 
> Here are the clean up patches.
> 
> Luo Jiaxing (2):
>   scsi: libsas: make switch and case at the same indent in
> sas_to_ata_err()
>   scsi: libsas: clean up for white spaces
> 
> [...]

Applied to 5.13/scsi-queue, thanks!

[1/2] scsi: libsas: make switch and case at the same indent in sas_to_ata_err()
  https://git.kernel.org/mkp/scsi/c/c03f2422b9f5
[2/2] scsi: libsas: clean up for white spaces
  https://git.kernel.org/mkp/scsi/c/857a80bbd732

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v2] scsi: libsas: Reset num_scatter if libata mark qc as NODATA

2021-04-12 Thread Martin K. Petersen

On Thu, 18 Mar 2021 15:56:32 -0700, Jolly Shah wrote:

> When the cache_type for the scsi device is changed, the scsi layer
> issues a MODE_SELECT command. The caching mode details are communicated
> via a request buffer associated with the scsi command with data
> direction set as DMA_TO_DEVICE (scsi_mode_select). When this command
> reaches the libata layer, as a part of generic initial setup, libata
> layer sets up the scatterlist for the command using the scsi command
> (ata_scsi_qc_new). This command is then translated by the libata layer
> into ATA_CMD_SET_FEATURES (ata_scsi_mode_select_xlat). The libata layer
> treats this as a non data command (ata_mselect_caching), since it only
> needs an ata taskfile to pass the caching on/off information to the
> device. It does not need the scatterlist that has been setup, so it does
> not perform dma_map_sg on the scatterlist (ata_qc_issue). Unfortunately,
> when this command reaches the libsas layer(sas_ata_qc_issue), libsas
> layer sees it as a non data command with a scatterlist. It cannot
> extract the correct dma length, since the scatterlist has not been
> mapped with dma_map_sg for a DMA operation. When this partially
> constructed SAS task reaches pm80xx LLDD, it results in below warning.
> 
> [...]

Applied to 5.12/scsi-fixes, thanks!

[1/1] scsi: libsas: Reset num_scatter if libata mark qc as NODATA
  https://git.kernel.org/mkp/scsi/c/176ddd89171d

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH RFC v2 4/4] virtio_net: move txq wakeups under tx q lock

2021-04-12 Thread Michael S. Tsirkin

We currently check num_free outside tx q lock
which is unsafe: new packets can arrive meanwhile
and there won't be space in the queue.
Thus a spurious queue wakeup causing overhead
and even packet drops.

Move the check under the lock to fix that.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 460ccdbb840e..febaf55ec1f6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1431,11 +1431,12 @@ static void virtnet_poll_cleantx(struct receive_queue 
*rq)
if (__netif_tx_trylock(txq)) {
virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, true);
+
+   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+   netif_tx_wake_queue(txq);
+
__netif_tx_unlock(txq);
}
-
-   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
-   netif_tx_wake_queue(txq);
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
@@ -1519,6 +1520,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int 
budget)
virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, true);
 
+   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+   netif_tx_wake_queue(txq);
+
opaque = virtqueue_enable_cb_prepare(sq->vq);
 
done = napi_complete_done(napi, 0);
@@ -1539,9 +1543,6 @@ static int virtnet_poll_tx(struct napi_struct *napi, int 
budget)
}
}
 
-   if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
-   netif_tx_wake_queue(txq);
-
return 0;
 }
 
-- 
MST

[PATCH RFC v2 3/4] virtio_net: move tx vq operation under tx queue lock

2021-04-12 Thread Michael S. Tsirkin

It's unsafe to operate a vq from multiple threads.
Unfortunately this is exactly what we do when invoking
clean tx poll from rx napi.
As a fix move everything that deals with the vq to under tx lock.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 16d5abed582c..460ccdbb840e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1505,6 +1505,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int 
budget)
struct virtnet_info *vi = sq->vq->vdev->priv;
unsigned int index = vq2txq(sq->vq);
struct netdev_queue *txq;
+   int opaque;
+   bool done;
 
if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
/* We don't need to enable cb for XDP */
@@ -1514,10 +1516,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, 
int budget)
 
txq = netdev_get_tx_queue(vi->dev, index);
__netif_tx_lock(txq, raw_smp_processor_id());
+   virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, true);
+
+   opaque = virtqueue_enable_cb_prepare(sq->vq);
+
+   done = napi_complete_done(napi, 0);
+
+   if (!done)
+   virtqueue_disable_cb(sq->vq);
+
__netif_tx_unlock(txq);
 
-   virtqueue_napi_complete(napi, sq->vq, 0);
+   if (done) {
+   if (unlikely(virtqueue_poll(sq->vq, opaque))) {
+   if (napi_schedule_prep(napi)) {
+   __netif_tx_lock(txq, raw_smp_processor_id());
+   virtqueue_disable_cb(sq->vq);
+   __netif_tx_unlock(txq);
+   __napi_schedule(napi);
+   }
+   }
+   }
 
if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
netif_tx_wake_queue(txq);
-- 
MST

[PATCH RFC v2 2/4] virtio_net: disable cb aggressively

2021-04-12 Thread Michael S. Tsirkin

There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi.  We currently do this with callbacks
enabled which can cause extra interrupts from the card.  Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 82e520d2cb12..16d5abed582c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1429,6 +1429,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
return;
 
if (__netif_tx_trylock(txq)) {
+   virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, true);
__netif_tx_unlock(txq);
}
@@ -1582,6 +1583,7 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct 
net_device *dev)
bool use_napi = sq->napi.weight;
 
/* Free up any pending old buffers before queueing new ones. */
+   virtqueue_disable_cb(sq->vq);
free_old_xmit_skbs(sq, false);
 
if (use_napi && kick)
-- 
MST

[PATCH RFC v2 1/4] virtio: fix up virtio_disable_cb

2021-04-12 Thread Michael S. Tsirkin

virtio_disable_cb is currently a nop for split ring with event index.
This is because it used to be always called from a callback when we know
device won't trigger more events until we update the index.  However,
now that we run with interrupts enabled a lot we also poll without a
callback so that is different: disabling callbacks will help reduce the
number of spurious interrupts.
Further, if using event index with a packed ring, and if being called
from a callback, we actually do disable interrupts which is unnecessary.

Fix both issues by tracking whenever we get a callback. If that is
the case disabling interrupts with event index can be a nop.
If not the case disable interrupts. Note: with a split ring
there's no explicit "no interrupts" value. For now we write
a fixed value so our chance of triggering an interupt
is 1/ring size. It's probably better to write something
related to the last used index there to reduce the chance
even further. For now I'm keeping it simple.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_ring.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..88f0b16b11b8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
 
+   /* Hint for event idx: already triggered no need to disable. */
+   bool event_triggered;
+
union {
/* Available for split ring */
struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue 
*_vq)
 
if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-   if (!vq->event)
+   if (vq->event)
+   /* TODO: this is a hack. Figure out a cleaner value to 
write. */
+   vring_used_event(>split.vring) = 0x0;
+   else
vq->split.vring.avail->flags =
cpu_to_virtio16(_vq->vdev,
vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+   vq->event_triggered = false;
vq->num_added = 0;
vq->packed_ring = true;
vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   /* If device triggered an event already it won't trigger one again:
+* no need to disable.
+*/
+   if (vq->event_triggered)
+   return;
+
if (vq->packed_ring)
virtqueue_disable_cb_packed(_vq);
else
@@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue 
*_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   if (vq->event_triggered)
+   vq->event_triggered = false;
+
return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
 virtqueue_enable_cb_prepare_split(_vq);
 }
@@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
+   if (vq->event_triggered)
+   vq->event_triggered = false;
+
return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
 virtqueue_enable_cb_delayed_split(_vq);
 }
@@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
if (unlikely(vq->broken))
return IRQ_HANDLED;
 
+   /* Just a hint for performance: so it's ok that this can be racy! */
+   if (vq->event)
+   vq->event_triggered = true;
+
pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
if (vq->vq.callback)
vq->vq.callback(>vq);
@@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int 
index,
vq->weak_barriers = weak_barriers;
vq->broken = false;
vq->last_used_idx = 0;
+   vq->event_triggered = false;
vq->num_added = 0;
vq->use_dma_api = vring_use_dma_api(vdev);
 #ifdef DEBUG
-- 
MST

[syzbot] KASAN: use-after-free Read in skcipher_walk_next

2021-04-12 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:4fa56ad0 Merge tag 'for-linus' of git://git.kernel.org/pub..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17dbd09ad0
kernel config:  https://syzkaller.appspot.com/x/.config?x=9320464bf47598bd
dashboard link: https://syzkaller.appspot.com/bug?extid=4061a98a8ab454dde8ff

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+4061a98a8ab454dde...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in memcpy include/linux/fortify-string.h:191 [inline]
BUG: KASAN: use-after-free in skcipher_next_copy crypto/skcipher.c:292 [inline]
BUG: KASAN: use-after-free in skcipher_walk_next+0xb69/0x1680 
crypto/skcipher.c:379
Read of size 2785 at addr 8880781c by task kworker/u4:3/204

CPU: 0 PID: 204 Comm: kworker/u4:3 Not tainted 5.12.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Workqueue: pencrypt_parallel padata_parallel_worker
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
 check_region_inline mm/kasan/generic.c:180 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
 memcpy+0x20/0x60 mm/kasan/shadow.c:65
 memcpy include/linux/fortify-string.h:191 [inline]
 skcipher_next_copy crypto/skcipher.c:292 [inline]
 skcipher_walk_next+0xb69/0x1680 crypto/skcipher.c:379
 skcipher_walk_done+0x7a3/0xf00 crypto/skcipher.c:159
 gcmaes_crypt_by_sg+0x377/0x8a0 arch/x86/crypto/aesni-intel_glue.c:694

The buggy address belongs to the page:
page:ea0001e07000 refcount:0 mapcount:-128 mapping: 
index:0x1 pfn:0x781c0
flags: 0xfff000()
raw: 00fff000 ea0001e06808 ea0001c67008 
raw: 0001 0004 ff7f 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8880781bff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 8880781bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>8880781c: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
   ^
 8880781c0080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 8880781c0100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

linux-next: manual merge of the kvm-arm tree with the arm64 tree

2021-04-12 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kvm-arm tree got a conflict in:

  arch/arm64/include/asm/assembler.h

between commits:

  27248fe1abb2 ("arm64: assembler: remove conditional NEON yield macros")
  13150149aa6d ("arm64: fpsimd: run kernel mode NEON with softirqs disabled")

from the arm64 tree and commits:

  8f4de66e247b ("arm64: asm: Provide set_sctlr_el2 macro")
  755db23420a1 ("KVM: arm64: Generate final CTR_EL0 value when running in 
Protected mode")

from the kvm-arm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/include/asm/assembler.h
index ab569b0b45fc,34ddd8a0f3dd..
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@@ -15,7 -15,7 +15,8 @@@
  #include 
  
  #include 
 +#include 
+ #include 
  #include 
  #include 
  #include 
@@@ -701,25 -705,95 +714,33 @@@ USER(\label, ic ivau, \tmp2)
// inval
isb
  .endm
  
+ .macro set_sctlr_el1, reg
+   set_sctlr sctlr_el1, \reg
+ .endm
+ 
+ .macro set_sctlr_el2, reg
+   set_sctlr sctlr_el2, \reg
+ .endm
+ 
 -/*
 - * Check whether to yield to another runnable task from kernel mode NEON code
 - * (which runs with preemption disabled).
 - *
 - * if_will_cond_yield_neon
 - *// pre-yield patchup code
 - * do_cond_yield_neon
 - *// post-yield patchup code
 - * endif_yield_neon
 - *
 - * where  is optional, and marks the point where execution will resume
 - * after a yield has been performed. If omitted, execution resumes right after
 - * the endif_yield_neon invocation. Note that the entire sequence, including
 - * the provided patchup code, will be omitted from the image if
 - * CONFIG_PREEMPTION is not defined.
 - *
 - * As a convenience, in the case where no patchup code is required, the above
 - * sequence may be abbreviated to
 - *
 - * cond_yield_neon 
 - *
 - * Note that the patchup code does not support assembler directives that 
change
 - * the output section, any use of such directives is undefined.
 - *
 - * The yield itself consists of the following:
 - * - Check whether the preempt count is exactly 1 and a reschedule is also
 - *   needed. If so, calling of preempt_enable() in kernel_neon_end() will
 - *   trigger a reschedule. If it is not the case, yielding is pointless.
 - * - Disable and re-enable kernel mode NEON, and branch to the yield fixup
 - *   code.
 - *
 - * This macro sequence may clobber all CPU state that is not guaranteed by the
 - * AAPCS to be preserved across an ordinary function call.
 - */
 -
 -  .macro  cond_yield_neon, lbl
 -  if_will_cond_yield_neon
 -  do_cond_yield_neon
 -  endif_yield_neon\lbl
 -  .endm
 -
 -  .macro  if_will_cond_yield_neon
 -#ifdef CONFIG_PREEMPTION
 -  get_current_taskx0
 -  ldr x0, [x0, #TSK_TI_PREEMPT]
 -  sub x0, x0, #PREEMPT_DISABLE_OFFSET
 -  cbz x0, .Lyield_\@
 -  /* fall through to endif_yield_neon */
 -  .subsection 1
 -.Lyield_\@ :
 -#else
 -  .section".discard.cond_yield_neon", "ax"
 -#endif
 -  .endm
 -
 -  .macro  do_cond_yield_neon
 -  bl  kernel_neon_end
 -  bl  kernel_neon_begin
 -  .endm
 -
 -  .macro  endif_yield_neon, lbl
 -  .ifnb   \lbl
 -  b   \lbl
 -  .else
 -  b   .Lyield_out_\@
 -  .endif
 -  .previous
 -.Lyield_out_\@ :
 -  .endm
 -
/*
 -   * Check whether preempt-disabled code should yield as soon as it
 -   * is able. This is the case if re-enabling preemption a single
 -   * time results in a preempt count of zero, and the TIF_NEED_RESCHED
 -   * flag is set. (Note that the latter is stored negated in the
 -   * top word of the thread_info::preempt_count field)
 +   * Check whether preempt/bh-disabled asm code should yield as soon as
 +   * it is able. This is the case if we are currently running in task
 +   * context, and either a softirq is pending, or the TIF_NEED_RESCHED
 +   * flag is set and re-enabling preemption a single time would result in
 +   * a preempt count of zero. (Note that the TIF_NEED_RESCHED flag is
 +   * stored negated in the top word of the thread_info::preempt_count
 +   * field)
 */
 -  .macro  cond_yield, lbl:req, tmp:req
 -#ifdef CONFIG_PREEMPTION
 +  .macro  cond_yield, lbl:req, tmp:req, tmp2:req
get_current_task \tmp
ldr \tmp, [\tmp, #TSK_TI_PREEMPT]
 +  /*
 +   * If we are serving a softirq,

Re: [RFC] mm: activate access-more-than-once page via NUMA balancing

2021-04-12 Thread Huang, Ying

Yu Zhao  writes:

> On Fri, Mar 26, 2021 at 12:21 AM Huang, Ying  wrote:
>>
>> Mel Gorman  writes:
>>
>> > On Thu, Mar 25, 2021 at 12:33:45PM +0800, Huang, Ying wrote:
>> >> > I caution against this patch.
>> >> >
>> >> > It's non-deterministic for a number of reasons. As it requires NUMA
>> >> > balancing to be enabled, the pageout behaviour of a system changes when
>> >> > NUMA balancing is active. If this led to pages being artificially and
>> >> > inappropriately preserved, NUMA balancing could be disabled for the
>> >> > wrong reasons.  It only applies to pages that have no target node so
>> >> > memory policies affect which pages are activated differently. Similarly,
>> >> > NUMA balancing does not scan all VMAs and some pages may never trap a
>> >> > NUMA fault as a result. The timing of when an address space gets scanned
>> >> > is driven by the locality of pages and so the timing of page activation
>> >> > potentially becomes linked to whether pages are local or need to migrate
>> >> > (although not right now for this patch as it only affects pages with a
>> >> > target nid of NUMA_NO_NODE). In other words, changes in NUMA balancing
>> >> > that affect migration potentially affect the aging rate.  Similarly,
>> >> > the activate rate of a process with a single thread and multiple threads
>> >> > potentially have different activation rates.
>> >> >
>> >> > Finally, the NUMA balancing scan algorithm is sub-optimal. It 
>> >> > potentially
>> >> > scans the entire address space even though only a small number of pages
>> >> > are scanned. This is particularly problematic when a process has a lot
>> >> > of threads because threads are redundantly scanning the same regions. If
>> >> > NUMA balancing ever introduced range tracking of faulted pages to limit
>> >> > how much scanning it has to do, it would inadvertently cause a change in
>> >> > page activation rate.
>> >> >
>> >> > NUMA balancing is about page locality, it should not get conflated with
>> >> > page aging.
>> >>
>> >> I understand your concerns about binding the NUMA balancing and page
>> >> reclaiming.  The requirement of the page locality and page aging is
>> >> different, so the policies need to be different.  This is the wrong part
>> >> of the patch.
>> >>
>> >> From another point of view, it's still possible to share some underlying
>> >> mechanisms (and code) between them.  That is, scanning the page tables
>> >> to make pages unaccessible and capture the page accesses via the page
>> >> fault.
>> >
>> > Potentially yes but not necessarily recommended for page aging. NUMA
>> > balancing has to be careful about the rate it scans pages to avoid
>> > excessive overhead so it's driven by locality. The scanning happens
>> > within a tasks context so during that time, the task is not executing
>> > its normal work and it incurs the overhead for faults. Generally, this
>> > is not too much overhead because pages get migrated locally, the scan
>> > rate drops and so does the overhead.
>> >
>> > However, if you want to drive page aging, that is constant so the rate
>> > could not be easily adapted in a way that would be deterministic.
>> >
>> >> Now these page accessing information is used for the page
>> >> locality.  Do you think it's a good idea to use these information for
>> >> the page aging too (but with a different policy as you pointed out)?
>> >>
>> >
>> > I'm not completely opposed to it but I think the overhead it would
>> > introduce could be severe. Worse, if a workload fits in memory and there
>> > is limited to no memory pressure, it's all overhead for no gain. Early
>> > generations of NUMA balancing had to find a balance to sure the gains
>> > from locality exceeded the cost of measuring locality and doing the same
>> > for page aging in some ways is even more challenging.
>>
>> Yes.  I will think more about it from the overhead vs. gain point of
>> view.  Thanks a lot for your sharing on that.
>>
>> >> From yet another point of view :-), in current NUMA balancing
>> >> implementation, it's assumed that the node private pages can fit in the
>> >> accessing node.  But this may be not always true.  Is it a valid
>> >> optimization to migrate the hot private pages first?
>> >>
>> >
>> > I'm not sure how the hotness of pages could be ranked. At the time of a
>> > hinting fault, the page is by definition active now because it was been
>> > accessed. Prioritising what pages to migrate based on the number of faults
>> > that have been trapped would have to be stored somewhere.
>>
>> Yes.  We need to store some information about that.  In an old version
>> of the patchset which uses NUMA balancing to promote hot pages from the
>> PMEM to DRAM, we have designed a method to measure the hotness of the
>> pages.  The basic idea is as follows,
>>
>> - When the page table of a process is scanned, the latest N scanning
>>   address ranges and scan times are recorded in a ring buffer of
>>   mm_struct.
>>
>> - In hint page fault handler,

Re: [PATCH 4.19 00/66] 4.19.187-rc1 review

2021-04-12 Thread Naresh Kamboju

On Mon, 12 Apr 2021 at 14:13, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 4.19.187 release.
> There are 66 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 14 Apr 2021 08:39:44 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.187-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.19.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

## Build
* kernel: 4.19.187-rc1
* git: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git branch: linux-4.19.y
* git commit: 85bc28045cdbb9576907965c761445aaece4f5ad
* git describe: v4.19.186-67-g85bc28045cdb
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19.186-67-g85bc28045cdb

## No regressions (compared to v4.19.185-19-g6aba908ea95f)

## No fixes (compared to v4.19.185-19-g6aba908ea95f)

## Test result summary
 total: 65010, pass: 52744, fail: 1575, skip: 10433, xfail: 258,

## Build Summary
* arm: 97 total, 96 passed, 1 failed
* arm64: 25 total, 24 passed, 1 failed
* dragonboard-410c: 1 total, 1 passed, 0 failed
* hi6220-hikey: 1 total, 1 passed, 0 failed
* i386: 15 total, 13 passed, 2 failed
* juno-r2: 1 total, 1 passed, 0 failed
* mips: 39 total, 39 passed, 0 failed
* s390: 9 total, 9 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x15: 2 total, 1 passed, 1 failed
* x86: 1 total, 1 passed, 0 failed
* x86_64: 15 total, 14 passed, 1 failed

## Test suites summary
* fwts
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-
* kselftest-x86
* kselftest-zram
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

Re: [PATCH][next] KEYS: trusted: Fix missing null return from kzalloc call

2021-04-12 Thread Sumit Garg

On Mon, 12 Apr 2021 at 22:34, Colin Ian King  wrote:
>
> On 12/04/2021 17:48, James Bottomley wrote:
> > On Mon, 2021-04-12 at 17:01 +0100, Colin King wrote:
> >> From: Colin Ian King 
> >>
> >> The kzalloc call can return null with the GFP_KERNEL flag so
> >> add a null check and exit via a new error exit label. Use the
> >> same exit error label for another error path too.
> >>
> >> Addresses-Coverity: ("Dereference null return value")
> >> Fixes: 830027e2cb55 ("KEYS: trusted: Add generic trusted keys
> >> framework")
> >> Signed-off-by: Colin Ian King 
> >> ---
> >>  security/keys/trusted-keys/trusted_core.c | 6 --
> >>  1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/security/keys/trusted-keys/trusted_core.c
> >> b/security/keys/trusted-keys/trusted_core.c
> >> index ec3a066a4b42..90774793f0b1 100644
> >> --- a/security/keys/trusted-keys/trusted_core.c
> >> +++ b/security/keys/trusted-keys/trusted_core.c
> >> @@ -116,11 +116,13 @@ static struct trusted_key_payload
> >> *trusted_payload_alloc(struct key *key)
> >>
> >>  ret = key_payload_reserve(key, sizeof(*p));
> >>  if (ret < 0)
> >> -return p;
> >> +goto err;
> >>  p = kzalloc(sizeof(*p), GFP_KERNEL);
> >> +if (!p)
> >> +goto err;
> >>
> >>  p->migratable = migratable;
> >> -
> >> +err:
> >>  return p;
> >
> > This is clearly a code migration bug in
> >
> > commit 251c85bd106099e6f388a89e88e12d14de2c9cda
> > Author: Sumit Garg 
> > Date:   Mon Mar 1 18:41:24 2021 +0530
> >
> > KEYS: trusted: Add generic trusted keys framework
> >
> > Which has for addition to trusted_core.c:
> >
> > +static struct trusted_key_payload *trusted_payload_alloc(struct key
> > *key)
> > +{
> > +   struct trusted_key_payload *p = NULL;
> > +   int ret;
> > +
> > +   ret = key_payload_reserve(key, sizeof(*p));
> > +   if (ret < 0)
> > +   return p;
> > +   p = kzalloc(sizeof(*p), GFP_KERNEL);
> > +
> > +   p->migratable = migratable;
> > +
> > +   return p;
> > +}
> >
> > And for trusted_tpm1.c:
> >
> > -static struct trusted_key_payload *trusted_payload_alloc(struct key
> > *key)
> > -{
> > -   struct trusted_key_payload *p = NULL;
> > -   int ret;
> > -
> > -   ret = key_payload_reserve(key, sizeof *p);
> > -   if (ret < 0)
> > -   return p;
> > -   p = kzalloc(sizeof *p, GFP_KERNEL);
> > -   if (p)
> > -   p->migratable = 1; /* migratable by default */
> > -   return p;
> > -}
> >
> > The trusted_tpm1.c code was correct and we got this bug introduced by
> > what should have been a simple cut and paste ... how did that happen?

It was a little more than just cut and paste where I did generalized
"migratable" flag to be provided by the corresponding trust source's
ops struct.

> > And therefore, how safe is the rest of the extraction into
> > trusted_core.c?
> >
>
> fortunately it gets caught by static analysis, but it does make me also
> concerned about what else has changed and how this gets through review.
>

I agree that extraction into trusted_core.c was a complex change but
this patch has been up for review for almost 2 years [1]. And
extensive testing can't catch this sort of bug as allocation wouldn't
normally fail.

[1] https://lwn.net/Articles/795416/

-Sumit

> > James
> >
> >
>

Re: [PATCH 0/1] Use of /sys/bus/pci/devices/…/index for non-SMBIOS platforms

2021-04-12 Thread Leon Romanovsky

On Mon, Apr 12, 2021 at 03:59:04PM +0200, Niklas Schnelle wrote:
> Hi Narendra, Hi All,
> 
> According to Documentation/ABI/testing/sysfs-bus-pci you are responsible
> for the index device attribute that is used by systemd to create network
> interface names.
> 
> Now we would like to reuse this attribute for firmware provided PCI
> device index numbers on the s390 architecture which doesn't have
> SMBIOS/DMI nor ACPI. All code changes are within our architecture
> specific code but I'd like to get some Acks for this reuse. I've sent an
> RFC version of this patch on 15th of March with the subject:
> 
>s390/pci: expose a PCI device's UID as its index
> 
> but got no response. Would it be okay to re-use this attribute for
> essentially the same purpose but with index numbers provided by
> a different platform mechanism? I think this would be cleaner than
> further proliferation of /sys/bus/pci/devices//xyz_index
> attributes and allows re-use of the existing userspace infrastructure.

I'm missing an explanation that this change is safe for systemd and
they don't have some hard-coded assumption about the meaning of existing
index on s390.

Thanks

[PATCH v2][next] scsi: aacraid: Replace one-element array with flexible-array member

2021-04-12 Thread Gustavo A. R. Silva

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Refactor the code according to the use of a flexible-array member in
struct aac_raw_io2 instead of one-element array, and use the
struct_size() and flex_array_size() helpers.

Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warnings:

drivers/scsi/aacraid/aachba.c: In function ‘aac_build_sgraw2’:
drivers/scsi/aacraid/aachba.c:3970:18: warning: array subscript 1 is above 
array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
 3970 | if (rio2->sge[j].length % (i*PAGE_SIZE)) {
  | ~^~~
drivers/scsi/aacraid/aachba.c:3974:27: warning: array subscript 1 is above 
array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
 3974 | nseg_new += (rio2->sge[j].length / (i*PAGE_SIZE));
  |  ~^~~
drivers/scsi/aacraid/aachba.c:4011:28: warning: array subscript 1 is above 
array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
 4011 |   for (j = 0; j < rio2->sge[i].length / (pages * PAGE_SIZE); ++j) {
  |   ~^~~
drivers/scsi/aacraid/aachba.c:4012:24: warning: array subscript 1 is above 
array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
 4012 |addr_low = rio2->sge[i].addrLow + j * pages * PAGE_SIZE;
  |   ~^~~
drivers/scsi/aacraid/aachba.c:4014:33: warning: array subscript 1 is above 
array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
 4014 |sge[pos].addrHigh = rio2->sge[i].addrHigh;
  |~^~~
drivers/scsi/aacraid/aachba.c:4015:28: warning: array subscript 1 is above 
array bounds of ‘struct sge_ieee1212[1]’ [-Warray-bounds]
 4015 |if (addr_low < rio2->sge[i].addrLow)
  |   ~^~~

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] 
https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109
Build-tested-by: kernel test robot 
Link: https://lore.kernel.org/lkml/60414244.ur4%2fki+fbf1ohkzs%25...@intel.com/
Signed-off-by: Gustavo A. R. Silva 
---
Changes in v2:
 - Add code comment for clarification.

 drivers/scsi/aacraid/aachba.c  | 17 +++--
 drivers/scsi/aacraid/aacraid.h |  2 +-
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/aacraid/aachba.c b/drivers/scsi/aacraid/aachba.c
index 8e06604370c4..2816a15d5633 100644
--- a/drivers/scsi/aacraid/aachba.c
+++ b/drivers/scsi/aacraid/aachba.c
@@ -1235,8 +1235,8 @@ static int aac_read_raw_io(struct fib * fib, struct 
scsi_cmnd * cmd, u64 lba, u3
if (ret < 0)
return ret;
command = ContainerRawIo2;
-   fibsize = sizeof(struct aac_raw_io2) +
-   ((le32_to_cpu(readcmd2->sgeCnt)-1) * sizeof(struct 
sge_ieee1212));
+   fibsize = struct_size(readcmd2, sge,
+le32_to_cpu(readcmd2->sgeCnt));
} else {
struct aac_raw_io *readcmd;
readcmd = (struct aac_raw_io *) fib_data(fib);
@@ -1366,8 +1366,8 @@ static int aac_write_raw_io(struct fib * fib, struct 
scsi_cmnd * cmd, u64 lba, u
if (ret < 0)
return ret;
command = ContainerRawIo2;
-   fibsize = sizeof(struct aac_raw_io2) +
-   ((le32_to_cpu(writecmd2->sgeCnt)-1) * sizeof(struct 
sge_ieee1212));
+   fibsize = struct_size(writecmd2, sge,
+ le32_to_cpu(writecmd2->sgeCnt));
} else {
struct aac_raw_io *writecmd;
writecmd = (struct aac_raw_io *) fib_data(fib);
@@ -4003,7 +4003,7 @@ static int aac_convert_sgraw2(struct aac_raw_io2 *rio2, 
int pages, int nseg, int
if (aac_convert_sgl == 0)
return 0;
 
-   sge = kmalloc_array(nseg_new, sizeof(struct sge_ieee1212), GFP_ATOMIC);
+   sge = kmalloc_array(nseg_new, sizeof(*sge), GFP_ATOMIC);
if (sge == NULL)
return -ENOMEM;
 
@@ -4020,7 +4020,12 @@ static int aac_convert_sgraw2(struct aac_raw_io2 *rio2, 
int pages, int nseg, int
}
}
sge[pos] = rio2->sge[nseg-1];
-   memcpy(>sge[1], [1], (nseg_new-1)*sizeof(struct 
sge_ieee1212));
+   /*
+* Notice that, in this case, flex_array_size() evaluates to
+* (nseg_new - 1) number of sge objects of type struct sge_ieee1212.
+*/
+   memcpy(>sge[1], [1],
+  flex_array_size(rio2, sge, nseg_new - 1));
 
kfree(sge);
rio2->sgeCnt = cpu_to_le32(nseg_new);
diff --git

Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-12 Thread Christoph Hellwig

And more importantly please test with a file system that uses the
iomap direct I/O code (btrfs, gfs2, ext4, xfs, zonefs) as we should
never just work aroudn a legacy codebase that should go away in the
block layer.

[PATCH v2 2/2] x86/tsc: skip tsc watchdog checking for qualified platforms

2021-04-12 Thread Feng Tang

There are cases that tsc clocksources are wrongly judged as unstable by
clocksource watchdogs like hpet, acpi_pm or 'refined-jiffies'. While
there is hardly a general reliable way to check the validity of a
watchdog, and to protect the innocent tsc, Thomas Gleixner proposed [1]:

"I'm inclined to lift that requirement when the CPU has:

1) X86_FEATURE_CONSTANT_TSC
2) X86_FEATURE_NONSTOP_TSC
3) X86_FEATURE_NONSTOP_TSC_S3
4) X86_FEATURE_TSC_ADJUST
5) At max. 4 sockets

 After two decades of horrors we're finally at a point where TSC seems
 to be halfway reliable and less abused by BIOS tinkerers. TSC_ADJUST
 was really key as we can now detect even small modifications reliably
 and the important point is that we can cure them as well (not pretty
 but better than all other options)."

As feature #3 X86_FEATURE_NONSTOP_TSC_S3 only exists on several generations
of Atom processor, and is always coupled with X86_FEATURE_CONSTANT_TSC
and X86_FEATURE_NONSTOP_TSC, skip checking it, and also be more defensive
to use maxim of 2 sockets.

The check is done inside tsc_init() before registering 'tsc-early' and
'tsc' clocksources, as there were cases that both of them had been
wrongly judged as unreliable.

[1]. https://lore.kernel.org/lkml/87eekfk8bd@nanos.tec.linutronix.de/
Suggested-by: Thomas Gleixner 
Signed-off-by: Feng Tang 
---
Change log:

  v2:
* Directly skip watchdog check without messing flag
  'tsc_clocksource_reliable' (Thomas)

 arch/x86/kernel/tsc.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index f70dffc..bfd013b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1177,6 +1177,12 @@ void mark_tsc_unstable(char *reason)
 
 EXPORT_SYMBOL_GPL(mark_tsc_unstable);
 
+static void __init tsc_skip_watchdog_verify(void)
+{
+   clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+   clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+}
+
 static void __init check_system_tsc_reliable(void)
 {
 #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || 
defined(CONFIG_X86_GENERIC)
@@ -1193,6 +1199,17 @@ static void __init check_system_tsc_reliable(void)
 #endif
if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
tsc_clocksource_reliable = 1;
+
+   /*
+* Ideally the socket number should be checked, but this is called
+* by tsc_init() which is in early boot phase and the socket numbers
+* may not be available. Use 'nr_online_nodes' as a fallback solution
+*/
+   if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+   boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+   boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+   nr_online_nodes <= 2)
+   tsc_skip_watchdog_verify();
 }
 
 /*
@@ -1384,9 +1401,6 @@ static int __init init_tsc_clocksource(void)
if (tsc_unstable)
goto unreg;
 
-   if (tsc_clocksource_reliable || no_tsc_watchdog)
-   clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-
if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
 
@@ -1524,7 +1538,7 @@ void __init tsc_init(void)
}
 
if (tsc_clocksource_reliable || no_tsc_watchdog)
-   clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+   tsc_skip_watchdog_verify();
 
clocksource_register_khz(_tsc_early, tsc_khz);
detect_art();
-- 
2.7.4

[PATCH v2 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

2021-04-12 Thread Feng Tang

Normally the tsc_sync will get checked every time system enters idle state,
but Thomas Gleixner mentioned there is still a caveat that a system won't
enter idle [1], either because it's too busy or configured purposely to not
enter idle. Setup a periodic timer to make sure the check is always on.

[1]. https://lore.kernel.org/lkml/875z286xtk@nanos.tec.linutronix.de/
Signed-off-by: Feng Tang 
---
Change log:
  
  v2:
 * skip timer setup when tsc_clocksource_reliabe==1 (Thomas)
 * refine comment and code format (Thomas) 

 arch/x86/kernel/tsc_sync.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 3d3c761..39f18fa 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -30,6 +30,7 @@ struct tsc_adjust {
 };
 
 static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+static struct timer_list tsc_sync_check_timer;
 
 /*
  * TSC's on different sockets may be reset asynchronously.
@@ -77,6 +78,44 @@ void tsc_verify_tsc_adjust(bool resume)
}
 }
 
+/*
+ * Normally the tsc_sync will be checked every time system enters idle state,
+ * but there is still caveat that a system won't enter idle, either because
+ * it's too busy or configured purposely to not enter idle.
+ *
+ * So setup a periodic timer to make sure the check is always on.
+ */
+
+#define SYNC_CHECK_INTERVAL(HZ * 600)
+
+static void tsc_sync_check_timer_fn(struct timer_list *unused)
+{
+   int next_cpu;
+
+   tsc_verify_tsc_adjust(false);
+
+   /* Run the check for all onlined CPUs in turn */
+   next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
+   if (next_cpu >= nr_cpu_ids)
+   next_cpu = cpumask_first(cpu_online_mask);
+
+   tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
+   add_timer_on(_sync_check_timer, next_cpu);
+}
+
+static int __init start_sync_check_timer(void)
+{
+   if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST) || tsc_clocksource_reliable)
+   return 0;
+
+   timer_setup(_sync_check_timer, tsc_sync_check_timer_fn, 0);
+   tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
+   add_timer(_sync_check_timer);
+
+   return 0;
+}
+late_initcall(start_sync_check_timer);
+
 static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
   unsigned int cpu, bool bootcpu)
 {
-- 
2.7.4

Re: [PATCH][next] KEYS: trusted: Fix missing null return from kzalloc call

2021-04-12 Thread Sumit Garg

On Mon, 12 Apr 2021 at 21:31, Colin King  wrote:
>
> From: Colin Ian King 
>
> The kzalloc call can return null with the GFP_KERNEL flag so
> add a null check and exit via a new error exit label. Use the
> same exit error label for another error path too.
>
> Addresses-Coverity: ("Dereference null return value")
> Fixes: 830027e2cb55 ("KEYS: trusted: Add generic trusted keys framework")
> Signed-off-by: Colin Ian King 
> ---
>  security/keys/trusted-keys/trusted_core.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>

Ah, it's my bad. Thanks for fixing this issue.

Reviewed-by: Sumit Garg 

-Sumit

> diff --git a/security/keys/trusted-keys/trusted_core.c 
> b/security/keys/trusted-keys/trusted_core.c
> index ec3a066a4b42..90774793f0b1 100644
> --- a/security/keys/trusted-keys/trusted_core.c
> +++ b/security/keys/trusted-keys/trusted_core.c
> @@ -116,11 +116,13 @@ static struct trusted_key_payload 
> *trusted_payload_alloc(struct key *key)
>
> ret = key_payload_reserve(key, sizeof(*p));
> if (ret < 0)
> -   return p;
> +   goto err;
> p = kzalloc(sizeof(*p), GFP_KERNEL);
> +   if (!p)
> +   goto err;
>
> p->migratable = migratable;
> -
> +err:
> return p;
>  }
>
> --
> 2.30.2
>

Re: [PATCH] ibmvfc: Fix invalid state machine BUG_ON

2021-04-12 Thread Martin K. Petersen



Tyrel,

> This fixes an issue hitting the BUG_ON in ibmvfc_do_work. When going
> through a host action of IBMVFC_HOST_ACTION_RESET, we change the
> action to IBMVFC_HOST_ACTION_TGT_DEL, then drop the host lock, and
> reset the CRQ, which changes the host state to IBMVFC_NO_CRQ.

[...]

Applied to 5.13/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH v2 6/9] userfaultfd/selftests: create alias mappings in the shmem test

2021-04-12 Thread Axel Rasmussen

Previously, we just allocated two shm areas: area_src and area_dst. With
this commit, change this so we also allocate area_src_alias, and
area_dst_alias.

area_*_alias and area_* (respectively) point to the same underlying
physical pages, but are different VMAs. In a future commit in this
series, we'll leverage this setup to exercise minor fault handling
support for shmem, just like we do in the hugetlb_shared test.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index fc40831f818f..1f65c4ab7994 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -278,13 +278,29 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
-   unsigned long offset =
-   alloc_area == (void **)_src ? 0 : nr_pages * page_size;
+   void *area_alias = NULL;
+   bool is_src = alloc_area == (void **)_src;
+   unsigned long offset = is_src ? 0 : nr_pages * page_size;
 
*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
   MAP_SHARED, shm_fd, offset);
if (*alloc_area == MAP_FAILED)
err("mmap of memfd failed");
+
+   area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+ MAP_SHARED, shm_fd, offset);
+   if (area_alias == MAP_FAILED)
+   err("mmap of memfd alias failed");
+
+   if (is_src)
+   area_src_alias = area_alias;
+   else
+   area_dst_alias = area_alias;
+}
+
+static void shmem_alias_mapping(__u64 *start, size_t len, unsigned long offset)
+{
+   *start = (unsigned long)area_dst_alias + offset;
 }
 
 struct uffd_test_ops {
@@ -314,7 +330,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
.expected_ioctls = SHMEM_EXPECTED_IOCTLS,
.allocate_area  = shmem_allocate_area,
.release_pages  = shmem_release_pages,
-   .alias_mapping = noop_alias_mapping,
+   .alias_mapping = shmem_alias_mapping,
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH v2 8/9] userfaultfd/selftests: exercise minor fault handling shmem support

2021-04-12 Thread Axel Rasmussen

Enable test_uffdio_minor for test_type == TEST_SHMEM, and modify the
test slightly to pass in / check for the right feature flags.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 29 
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 0ff01f437a39..0830f155e6c2 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -484,6 +484,7 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool 
wp)
 static void continue_range(int ufd, __u64 start, __u64 len)
 {
struct uffdio_continue req;
+   int ret;
 
req.range.start = start;
req.range.len = len;
@@ -492,6 +493,17 @@ static void continue_range(int ufd, __u64 start, __u64 len)
if (ioctl(ufd, UFFDIO_CONTINUE, ))
err("UFFDIO_CONTINUE failed for address 0x%" PRIx64,
(uint64_t)start);
+
+   /*
+* Error handling within the kernel for continue is subtly different
+* from copy or zeropage, so it may be a source of bugs. Trigger an
+* error (-EEXIST) on purpose, to verify doing so doesn't cause a BUG.
+*/
+   req.mapped = 0;
+   ret = ioctl(ufd, UFFDIO_CONTINUE, );
+   if (ret >= 0 || req.mapped != -EEXIST)
+   err("failed to exercise UFFDIO_CONTINUE error handling, ret=%d, 
mapped=%" PRId64,
+   ret, (int64_t) req.mapped);
 }
 
 static void *locking_thread(void *arg)
@@ -1196,7 +1208,7 @@ static int userfaultfd_minor_test(void)
void *expected_page;
char c;
struct uffd_stats stats = { 0 };
-   uint64_t features = UFFD_FEATURE_MINOR_HUGETLBFS;
+   uint64_t req_features, features_out;
 
if (!test_uffdio_minor)
return 0;
@@ -1204,10 +1216,18 @@ static int userfaultfd_minor_test(void)
printf("testing minor faults: ");
fflush(stdout);
 
-   if (uffd_test_ctx_clear() || uffd_test_ctx_init_ext())
+   if (test_type == TEST_HUGETLB)
+   req_features = UFFD_FEATURE_MINOR_HUGETLBFS;
+   else if (test_type == TEST_SHMEM)
+   req_features = UFFD_FEATURE_MINOR_SHMEM;
+   else
+   return 1;
+
+   features_out = req_features;
+   if (uffd_test_ctx_clear() || uffd_test_ctx_init_ext(_out))
return 1;
-   /* If kernel reports the feature isn't supported, skip the test. */
-   if (!(features & UFFD_FEATURE_MINOR_HUGETLBFS)) {
+   /* If kernel reports required features aren't supported, skip test. */
+   if ((features_out & req_features) != req_features) {
printf("skipping test due to lack of feature support\n");
fflush(stdout);
return 0;
@@ -1442,6 +1462,7 @@ static void set_test_type(const char *type)
map_shared = true;
test_type = TEST_SHMEM;
uffd_test_ops = _uffd_test_ops;
+   test_uffdio_minor = true;
} else {
err("Unknown test type: %s", type);
}
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH v2 7/9] userfaultfd/selftests: reinitialize test context in each test

2021-04-12 Thread Axel Rasmussen

Currently, the context (fds, mmap-ed areas, etc.) are global. Each test
mutates this state in some way, in some cases really "clobbering it"
(e.g., the events test mremap-ing area_dst over the top of area_src, or
the minor faults tests overwriting the count_verify values in the test
areas). We run the tests in a particular order, each test is careful to
make the right assumptions about its starting state, etc.

But, this is fragile. It's better for a test's success or failure to not
depend on what some other prior test case did to the global state.

To that end, clear and reinitialize the test context at the start of
each test case, so whatever prior test cases did doesn't affect future
tests.

This is particularly relevant to this series because the events test's
mremap of area_dst screws up assumptions the minor fault test was
relying on. This wasn't a problem for hugetlb, as we don't mremap in
that case.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 221 +--
 1 file changed, 127 insertions(+), 94 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 1f65c4ab7994..0ff01f437a39 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -89,7 +89,8 @@ static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
-static int uffd, uffd_flags, finished, *pipefd;
+static int uffd = -1;
+static int uffd_flags, finished, *pipefd;
 static char *area_src, *area_src_alias, *area_dst, *area_dst_alias;
 static char *zeropage;
 pthread_attr_t attr;
@@ -342,6 +343,121 @@ static struct uffd_test_ops hugetlb_uffd_test_ops = {
 
 static struct uffd_test_ops *uffd_test_ops;
 
+static int userfaultfd_open(uint64_t *features)
+{
+   struct uffdio_api uffdio_api;
+
+   uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK);
+   if (uffd < 0)
+   err("userfaultfd syscall not available in this kernel");
+   uffd_flags = fcntl(uffd, F_GETFD, NULL);
+
+   uffdio_api.api = UFFD_API;
+   uffdio_api.features = *features;
+   if (ioctl(uffd, UFFDIO_API, _api))
+   err("UFFDIO_API failed.\nPlease make sure to "
+   "run with either root or ptrace capability.");
+   if (uffdio_api.api != UFFD_API)
+   err("UFFDIO_API error: %" PRIu64, (uint64_t)uffdio_api.api);
+
+   *features = uffdio_api.features;
+   return 0;
+}
+
+static int uffd_test_ctx_init_ext(uint64_t *features)
+{
+   unsigned long nr, cpu;
+
+   uffd_test_ops->allocate_area((void **)_src);
+   if (!area_src)
+   return 1;
+   uffd_test_ops->allocate_area((void **)_dst);
+   if (!area_dst)
+   return 1;
+
+   uffd_test_ops->release_pages(area_src);
+   uffd_test_ops->release_pages(area_dst);
+
+   if (userfaultfd_open(features))
+   return 1;
+
+   count_verify = malloc(nr_pages * sizeof(unsigned long long));
+   if (!count_verify)
+   err("count_verify");
+
+   for (nr = 0; nr < nr_pages; nr++) {
+   *area_mutex(area_src, nr) =
+   (pthread_mutex_t)PTHREAD_MUTEX_INITIALIZER;
+   count_verify[nr] = *area_count(area_src, nr) = 1;
+   /*
+* In the transition between 255 to 256, powerpc will
+* read out of order in my_bcmp and see both bytes as
+* zero, so leave a placeholder below always non-zero
+* after the count, to avoid my_bcmp to trigger false
+* positives.
+*/
+   *(area_count(area_src, nr) + 1) = 1;
+   }
+
+   pipefd = malloc(sizeof(int) * nr_cpus * 2);
+   if (!pipefd)
+   err("pipefd");
+   for (cpu = 0; cpu < nr_cpus; cpu++)
+   if (pipe2([cpu * 2], O_CLOEXEC | O_NONBLOCK))
+   err("pipe");
+
+   return 0;
+}
+
+static inline int uffd_test_ctx_init(uint64_t features)
+{
+   return uffd_test_ctx_init_ext();
+}
+
+static inline int munmap_area(void **area)
+{
+   if (*area)
+   if (munmap(*area, nr_pages * page_size))
+   err("munmap");
+
+   *area = NULL;
+   return 0;
+}
+
+static int uffd_test_ctx_clear(void)
+{
+   int ret = 0;
+   size_t i;
+
+   if (pipefd) {
+   for (i = 0; i < nr_cpus * 2; ++i) {
+   if (close(pipefd[i]))
+   err("close pipefd");
+   }
+   free(pipefd);
+   pipefd = NULL;
+   }
+
+   if (count_verify) {
+   free(count_verify);
+   count_verify = NULL;
+   }
+
+   if (uffd != -1) {
+   if (close(uffd))
+   err("close uffd");
+   uffd = -1;
+   }
+
+   huge_fd_off0 = NULL;
+   ret |=

[PATCH v2 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem

2021-04-12 Thread Axel Rasmussen

With this change, userspace can resolve a minor fault within a
shmem-backed area with a UFFDIO_CONTINUE ioctl. The semantics for this
match those for hugetlbfs - we look up the existing page in the page
cache, and install PTEs for it.

This commit introduces a new helper: mcopy_atomic_install_ptes.

Why handle UFFDIO_CONTINUE for shmem in mm/userfaultfd.c, instead of in
shmem.c? The existing userfault implementation only relies on shmem.c
for VM_SHARED VMAs. However, minor fault handling / CONTINUE work just
fine for !VM_SHARED VMAs as well. We'd prefer to handle CONTINUE for
shmem in one place, regardless of shared/private (to reduce code
duplication).

Why add a new mcopy_atomic_install_ptes helper? A problem we have with
continue is that shmem_mcopy_atomic_pte() and mcopy_atomic_pte() are
*close* to what we want, but not exactly. We do want to setup the PTEs
in a CONTINUE operation, but we don't want to e.g. allocate a new page,
charge it (e.g. to the shmem inode), manipulate various flags, etc. Also
we have the problem stated above: shmem_mcopy_atomic_pte() and
mcopy_atomic_pte() both handle one-half of the problem (shared /
private) continue cares about. So, introduce mcontinue_atomic_pte(), to
handle all of the shmem continue cases. Introduce the helper so it
doesn't duplicate code with mcopy_atomic_pte().

In a future commit, shmem_mcopy_atomic_pte() will also be modified to
use this new helper. However, since this is a bigger refactor, it seems
most clear to do it as a separate change.

Signed-off-by: Axel Rasmussen 
---
 mm/userfaultfd.c | 176 +++
 1 file changed, 131 insertions(+), 45 deletions(-)

diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 23fa2583bbd1..8df0438f5d6a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -48,6 +48,87 @@ struct vm_area_struct *find_dst_vma(struct mm_struct *dst_mm,
return dst_vma;
 }
 
+/*
+ * Install PTEs, to map dst_addr (within dst_vma) to page.
+ *
+ * This function handles MCOPY_ATOMIC_CONTINUE (which is always file-backed),
+ * whether or not dst_vma is VM_SHARED. It also handles the more general
+ * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be file
+ * backed, or not).
+ *
+ * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
+ * shmem_mcopy_atomic_pte instead.
+ */
+static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+struct vm_area_struct *dst_vma,
+unsigned long dst_addr, struct page *page,
+bool newly_allocated, bool wp_copy)
+{
+   int ret;
+   pte_t _dst_pte, *dst_pte;
+   int writable;
+   bool vm_shared = dst_vma->vm_flags & VM_SHARED;
+   spinlock_t *ptl;
+   struct inode *inode;
+   pgoff_t offset, max_off;
+
+   _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
+   writable = dst_vma->vm_flags & VM_WRITE;
+   /* For private, non-anon we need CoW (don't write to page cache!) */
+   if (!vma_is_anonymous(dst_vma) && !vm_shared)
+   writable = 0;
+
+   if (writable || vma_is_anonymous(dst_vma))
+   _dst_pte = pte_mkdirty(_dst_pte);
+   if (writable) {
+   if (wp_copy)
+   _dst_pte = pte_mkuffd_wp(_dst_pte);
+   else
+   _dst_pte = pte_mkwrite(_dst_pte);
+   } else if (vm_shared) {
+   /*
+* Since we didn't pte_mkdirty(), mark the page dirty or it
+* could be freed from under us. We could do this
+* unconditionally, but doing it only if !writable is faster.
+*/
+   set_page_dirty(page);
+   }
+
+   dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, );
+
+   if (vma_is_shmem(dst_vma)) {
+   /* serialize against truncate with the page table lock */
+   inode = dst_vma->vm_file->f_inode;
+   offset = linear_page_index(dst_vma, dst_addr);
+   max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+   ret = -EFAULT;
+   if (unlikely(offset >= max_off))
+   goto out_unlock;
+   }
+
+   ret = -EEXIST;
+   if (!pte_none(*dst_pte))
+   goto out_unlock;
+
+   inc_mm_counter(dst_mm, mm_counter(page));
+   if (vma_is_shmem(dst_vma))
+   page_add_file_rmap(page, false);
+   else
+   page_add_new_anon_rmap(page, dst_vma, dst_addr, false);
+
+   if (newly_allocated)
+   lru_cache_add_inactive_or_unevictable(page, dst_vma);
+
+   set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+
+   /* No need to invalidate - it was non-present before */
+   update_mmu_cache(dst_vma, dst_addr, dst_pte);
+   ret = 0;
+out_unlock:
+   pte_unmap_unlock(dst_pte, ptl);
+   return ret;
+}
+
 static int

[PATCH v2 2/9] userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte

2021-04-12 Thread Axel Rasmussen

Previously, we did a dance where we had one calling path in
userfaultfd.c (mfill_atomic_pte), but then we split it into two in
shmem_fs.h (shmem_{mcopy_atomic,mfill_zeropage}_pte), and then rejoined
into a single shared function in shmem.c (shmem_mfill_atomic_pte).

This is all a bit overly complex. Just call the single combined shmem
function directly, allowing us to clean up various branches,
boilerplate, etc.

While we're touching this function, two other small cleanup changes:
- offset is equivalent to pgoff, so we can get rid of offset entirely.
- Split two VM_BUG_ON cases into two statements. This means the line
  number reported when the BUG is hit specifies exactly which condition
  was true.

Reviewed-by: Peter Xu 
Signed-off-by: Axel Rasmussen 
---
 include/linux/shmem_fs.h | 15 +---
 mm/shmem.c   | 52 +---
 mm/userfaultfd.c | 10 +++-
 3 files changed, 25 insertions(+), 52 deletions(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index d82b6f396588..919e36671fe6 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -122,21 +122,18 @@ static inline bool shmem_file(struct file *file)
 extern bool shmem_charge(struct inode *inode, long pages);
 extern void shmem_uncharge(struct inode *inode, long pages);
 
+#ifdef CONFIG_USERFAULTFD
 #ifdef CONFIG_SHMEM
 extern int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
  struct vm_area_struct *dst_vma,
  unsigned long dst_addr,
  unsigned long src_addr,
+ bool zeropage,
  struct page **pagep);
-extern int shmem_mfill_zeropage_pte(struct mm_struct *dst_mm,
-   pmd_t *dst_pmd,
-   struct vm_area_struct *dst_vma,
-   unsigned long dst_addr);
-#else
+#else /* !CONFIG_SHMEM */
 #define shmem_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \
-  src_addr, pagep)({ BUG(); 0; })
-#define shmem_mfill_zeropage_pte(dst_mm, dst_pmd, dst_vma, \
-dst_addr)  ({ BUG(); 0; })
-#endif
+  src_addr, zeropage, pagep)   ({ BUG(); 0; })
+#endif /* CONFIG_SHMEM */
+#endif /* CONFIG_USERFAULTFD */
 
 #endif
diff --git a/mm/shmem.c b/mm/shmem.c
index 26c76b13ad23..b72c55aa07fc 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2354,13 +2354,14 @@ static struct inode *shmem_get_inode(struct super_block 
*sb, const struct inode
return inode;
 }
 
-static int shmem_mfill_atomic_pte(struct mm_struct *dst_mm,
- pmd_t *dst_pmd,
- struct vm_area_struct *dst_vma,
- unsigned long dst_addr,
- unsigned long src_addr,
- bool zeropage,
- struct page **pagep)
+#ifdef CONFIG_USERFAULTFD
+int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
+  pmd_t *dst_pmd,
+  struct vm_area_struct *dst_vma,
+  unsigned long dst_addr,
+  unsigned long src_addr,
+  bool zeropage,
+  struct page **pagep)
 {
struct inode *inode = file_inode(dst_vma->vm_file);
struct shmem_inode_info *info = SHMEM_I(inode);
@@ -2372,7 +2373,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
struct page *page;
pte_t _dst_pte, *dst_pte;
int ret;
-   pgoff_t offset, max_off;
+   pgoff_t max_off;
 
ret = -ENOMEM;
if (!shmem_inode_acct_block(inode, 1))
@@ -2383,7 +2384,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
if (!page)
goto out_unacct_blocks;
 
-   if (!zeropage) {/* mcopy_atomic */
+   if (!zeropage) {/* COPY */
page_kaddr = kmap_atomic(page);
ret = copy_from_user(page_kaddr,
 (const void __user *)src_addr,
@@ -2397,7 +2398,7 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
/* don't free the page */
return -ENOENT;
}
-   } else {/* mfill_zeropage_atomic */
+   } else {/* ZEROPAGE */
clear_highpage(page);
}
} else {
@@ -2405,15 +2406,15 @@ static int shmem_mfill_atomic_pte(struct mm_struct 
*dst_mm,
*pagep = NULL;
}
 
-   VM_BUG_ON(PageLocked(page) || PageSwapBacked(page));
+   VM_BUG_ON(PageLocked(page));
+

[PATCH v2 5/9] userfaultfd/selftests: use memfd_create for shmem test type

2021-04-12 Thread Axel Rasmussen

This is a preparatory commit. In the future, we want to be able to setup
alias mappings for area_src and area_dst in the shmem test, like we do
in the hugetlb_shared test. With a VMA obtained via
mmap(MAP_ANONYMOUS | MAP_SHARED), it isn't clear how to do this.

So, mmap() with an fd, so we can create alias mappings. Use memfd_create
instead of actually passing in a tmpfs path like hugetlb does, since
it's more convenient / simpler to run, and works just as well.

Future commits will:

1. Setup the alias mappings.
2. Extend our tests to actually take advantage of this, to test new
   userfaultfd behavior being introduced in this series.

Also, a small fix in the area we're changing: when the hugetlb setup
fails in main(), pass in the right argv[] so we actually print out the
hugetlb file path.

Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 6339aeaeeff8..fc40831f818f 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -85,6 +85,7 @@ static bool test_uffdio_wp = false;
 static bool test_uffdio_minor = false;
 
 static bool map_shared;
+static int shm_fd;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
@@ -277,8 +278,11 @@ static void shmem_release_pages(char *rel_area)
 
 static void shmem_allocate_area(void **alloc_area)
 {
+   unsigned long offset =
+   alloc_area == (void **)_src ? 0 : nr_pages * page_size;
+
*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-  MAP_ANONYMOUS | MAP_SHARED, -1, 0);
+  MAP_SHARED, shm_fd, offset);
if (*alloc_area == MAP_FAILED)
err("mmap of memfd failed");
 }
@@ -1448,6 +1452,16 @@ int main(int argc, char **argv)
err("Open of %s failed", argv[4]);
if (ftruncate(huge_fd, 0))
err("ftruncate %s to size 0 failed", argv[4]);
+   } else if (test_type == TEST_SHMEM) {
+   shm_fd = memfd_create(argv[0], 0);
+   if (shm_fd < 0)
+   err("memfd_create");
+   if (ftruncate(shm_fd, nr_pages * page_size * 2))
+   err("ftruncate");
+   if (fallocate(shm_fd,
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 0,
+ nr_pages * page_size * 2))
+   err("fallocate");
}
printf("nr_pages: %lu, nr_pages_per_cpu: %lu\n",
   nr_pages, nr_pages_per_cpu);
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH v2 9/9] userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes

2021-04-12 Thread Axel Rasmussen

In a previous commit, we added the mcopy_atomic_install_ptes() helper.
This helper does the job of setting up PTEs for an existing page, to map
it into a given VMA. It deals with both the anon and shmem cases, as
well as the shared and private cases.

In other words, shmem_mcopy_atomic_pte() duplicates a case it already
handles. So, expose it, and let shmem_mcopy_atomic_pte() use it
directly, to reduce code duplication.

This requires that we refactor shmem_mcopy_atomic-pte() a bit:

Instead of doing accounting (shmem_recalc_inode() et al) part-way
through the PTE setup, do it beforehand. This frees up
mcopy_atomic_install_ptes() from having to care about this accounting,
but it does mean we need to clean it up if we get a failure afterwards
(shmem_uncharge()).

We can *almost* use shmem_charge() to do this, reducing code
duplication. But, it does `inode->i_mapping->nrpages++`, which would
double-count since shmem_add_to_page_cache() also does this.

Signed-off-by: Axel Rasmussen 
---
 include/linux/userfaultfd_k.h |  5 
 mm/shmem.c| 52 +++
 mm/userfaultfd.c  | 25 -
 3 files changed, 27 insertions(+), 55 deletions(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 794d1538b8ba..3e20bfa9ef80 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -53,6 +53,11 @@ enum mcopy_atomic_mode {
MCOPY_ATOMIC_CONTINUE,
 };
 
+extern int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd,
+struct vm_area_struct *dst_vma,
+unsigned long dst_addr, struct page *page,
+bool newly_allocated, bool wp_copy);
+
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long src_start, unsigned long len,
bool *mmap_changing, __u64 mode);
diff --git a/mm/shmem.c b/mm/shmem.c
index 3f48cb5e8404..9b12298405a4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2376,10 +2376,8 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
struct address_space *mapping = inode->i_mapping;
gfp_t gfp = mapping_gfp_mask(mapping);
pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
-   spinlock_t *ptl;
void *page_kaddr;
struct page *page;
-   pte_t _dst_pte, *dst_pte;
int ret;
pgoff_t max_off;
 
@@ -2389,8 +2387,10 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
 
if (!*pagep) {
page = shmem_alloc_page(gfp, info, pgoff);
-   if (!page)
-   goto out_unacct_blocks;
+   if (!page) {
+   shmem_inode_unacct_blocks(inode, 1);
+   goto out;
+   }
 
if (!zeropage) {/* COPY */
page_kaddr = kmap_atomic(page);
@@ -2430,59 +2430,27 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm,
if (ret)
goto out_release;
 
-   _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
-   if (dst_vma->vm_flags & VM_WRITE)
-   _dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte));
-   else {
-   /*
-* We don't set the pte dirty if the vma has no
-* VM_WRITE permission, so mark the page dirty or it
-* could be freed from under us. We could do it
-* unconditionally before unlock_page(), but doing it
-* only if VM_WRITE is not set is faster.
-*/
-   set_page_dirty(page);
-   }
-
-   dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, );
-
-   ret = -EFAULT;
-   max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
-   if (unlikely(pgoff >= max_off))
-   goto out_release_unlock;
-
-   ret = -EEXIST;
-   if (!pte_none(*dst_pte))
-   goto out_release_unlock;
-
-   lru_cache_add(page);
-
spin_lock_irq(>lock);
info->alloced++;
inode->i_blocks += BLOCKS_PER_PAGE;
shmem_recalc_inode(inode);
spin_unlock_irq(>lock);
 
-   inc_mm_counter(dst_mm, mm_counter_file(page));
-   page_add_file_rmap(page, false);
-   set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+   ret = mcopy_atomic_install_ptes(dst_mm, dst_pmd, dst_vma, dst_addr,
+   page, true, false);
+   if (ret)
+   goto out_release_uncharge;
 
-   /* No need to invalidate - it was non-present before */
-   update_mmu_cache(dst_vma, dst_addr, dst_pte);
-   pte_unmap_unlock(dst_pte, ptl);
unlock_page(page);
ret = 0;
 out:
return ret;
-out_release_unlock:
-   pte_unmap_unlock(dst_pte, ptl);
-   ClearPageDirty(page);
+out_release_uncharge:
delete_from_page_cache(page);
+

[PATCH v2 3/9] userfaultfd/shmem: support minor fault registration for shmem

2021-04-12 Thread Axel Rasmussen

This patch allows shmem-backed VMAs to be registered for minor faults.
Minor faults are appropriately relayed to userspace in the fault path,
for VMAs with the relevant flag.

This commit doesn't hook up the UFFDIO_CONTINUE ioctl for shmem-backed
minor faults, though, so userspace doesn't yet have a way to resolve
such faults.

Signed-off-by: Axel Rasmussen 
---
 fs/userfaultfd.c |  6 +++---
 include/uapi/linux/userfaultfd.h |  7 ++-
 mm/memory.c  |  8 +---
 mm/shmem.c   | 10 +-
 4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 14f92285d04f..9f3b8684cf3c 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1267,8 +1267,7 @@ static inline bool vma_can_userfault(struct 
vm_area_struct *vma,
}
 
if (vm_flags & VM_UFFD_MINOR) {
-   /* FIXME: Add minor fault interception for shmem. */
-   if (!is_vm_hugetlb_page(vma))
+   if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma)))
return false;
}
 
@@ -1941,7 +1940,8 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
/* report all available features and ioctls to userland */
uffdio_api.features = UFFD_API_FEATURES;
 #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-   uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS;
+   uffdio_api.features &=
+   ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM);
 #endif
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index bafbeb1a2624..159a74e9564f 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -31,7 +31,8 @@
   UFFD_FEATURE_MISSING_SHMEM | \
   UFFD_FEATURE_SIGBUS |\
   UFFD_FEATURE_THREAD_ID | \
-  UFFD_FEATURE_MINOR_HUGETLBFS)
+  UFFD_FEATURE_MINOR_HUGETLBFS |   \
+  UFFD_FEATURE_MINOR_SHMEM)
 #define UFFD_API_IOCTLS\
((__u64)1 << _UFFDIO_REGISTER | \
 (__u64)1 << _UFFDIO_UNREGISTER |   \
@@ -185,6 +186,9 @@ struct uffdio_api {
 * UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults
 * can be intercepted (via REGISTER_MODE_MINOR) for
 * hugetlbfs-backed pages.
+*
+* UFFD_FEATURE_MINOR_SHMEM indicates the same support as
+* UFFD_FEATURE_MINOR_HUGETLBFS, but for shmem-backed pages instead.
 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
 #define UFFD_FEATURE_EVENT_FORK(1<<1)
@@ -196,6 +200,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_SIGBUS(1<<7)
 #define UFFD_FEATURE_THREAD_ID (1<<8)
 #define UFFD_FEATURE_MINOR_HUGETLBFS   (1<<9)
+#define UFFD_FEATURE_MINOR_SHMEM   (1<<10)
__u64 features;
 
__u64 ioctls;
diff --git a/mm/memory.c b/mm/memory.c
index 4e358601c5d6..cc71a445c76c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3972,9 +3972,11 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf)
 * something).
 */
if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) {
-   ret = do_fault_around(vmf);
-   if (ret)
-   return ret;
+   if (likely(!userfaultfd_minor(vmf->vma))) {
+   ret = do_fault_around(vmf);
+   if (ret)
+   return ret;
+   }
}
 
ret = __do_fault(vmf);
diff --git a/mm/shmem.c b/mm/shmem.c
index b72c55aa07fc..3f48cb5e8404 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1785,7 +1785,7 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t 
index,
  * vm. If we swap it in we mark it dirty since we also free the swap
  * entry since a page cannot live in both the swap and page cache.
  *
- * vmf and fault_type are only supplied by shmem_fault:
+ * vma, vmf, and fault_type are only supplied by shmem_fault:
  * otherwise they are NULL.
  */
 static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
@@ -1820,6 +1820,14 @@ static int shmem_getpage_gfp(struct inode *inode, 
pgoff_t index,
 
page = pagecache_get_page(mapping, index,
FGP_ENTRY | FGP_HEAD | FGP_LOCK, 0);
+
+   if (page && vma && userfaultfd_minor(vma)) {
+   unlock_page(page);
+   put_page(page);
+   *fault_type = handle_userfault(vmf, VM_UFFD_MINOR);
+   return 0;
+   }
+
if (xa_is_value(page)) {
error = shmem_swapin_page(inode, index, ,
  sgp, gfp, vma, fault_type);
--

Re: [PATCH 5.4 000/111] 5.4.112-rc1 review

2021-04-12 Thread Naresh Kamboju

On Mon, 12 Apr 2021 at 14:16, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 5.4.112 release.
> There are 111 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 14 Apr 2021 08:39:44 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.112-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

## Build
* kernel: 5.4.112-rc1
* git: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git branch: linux-5.4.y
* git commit: f9b2de2cddd4601c5d2f2947fc5cebb7dbecd266
* git describe: v5.4.111-112-gf9b2de2cddd4
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.4.y/build/v5.4.111-112-gf9b2de2cddd4

## No regressions (compared to v5.4.110-24-g9b00696cdc42)

## No fixes (compared to v5.4.110-24-g9b00696cdc42)


## Test result summary
 total: 66568, pass: 55229, fail: 884, skip: 10210, xfail: 245,

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 190 total, 190 passed, 0 failed
* arm64: 25 total, 25 passed, 0 failed
* dragonboard-410c: 1 total, 1 passed, 0 failed
* hi6220-hikey: 1 total, 1 passed, 0 failed
* i386: 13 total, 13 passed, 0 failed
* juno-r2: 1 total, 1 passed, 0 failed
* mips: 45 total, 45 passed, 0 failed
* parisc: 9 total, 9 passed, 0 failed
* powerpc: 27 total, 27 passed, 0 failed
* riscv: 21 total, 21 passed, 0 failed
* s390: 9 total, 9 passed, 0 failed
* sh: 18 total, 18 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x15: 2 total, 1 passed, 1 failed
* x86: 1 total, 1 passed, 0 failed
* x86_64: 25 total, 25 passed, 0 failed

## Test suites summary
* fwts
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-x86
* kselftest-zram
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

[PATCH v2 1/9] userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h

2021-04-12 Thread Axel Rasmussen

Minimizing header file inclusion is desirable. In this case, we can do
so just by forward declaring the enumeration our signature relies upon.

Reviewed-by: Peter Xu 
Signed-off-by: Axel Rasmussen 
---
 include/linux/hugetlb.h | 4 +++-
 mm/hugetlb.c| 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 09f1fd12a6fa..3f47650ab79b 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 
 struct ctl_table;
 struct user_struct;
@@ -135,6 +134,8 @@ void hugetlb_show_meminfo(void);
 unsigned long hugetlb_total_pages(void);
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
+
+enum mcopy_atomic_mode;
 #ifdef CONFIG_USERFAULTFD
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
struct vm_area_struct *dst_vma,
@@ -143,6 +144,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, 
pte_t *dst_pte,
enum mcopy_atomic_mode mode,
struct page **pagep);
 #endif /* CONFIG_USERFAULTFD */
+
 bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
vm_flags_t vm_flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 54d81d5947ed..b1652e747318 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 int hugetlb_max_hstate __read_mostly;
-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH v2 0/9] userfaultfd: add minor fault handling for shmem

2021-04-12 Thread Axel Rasmussen

Base


This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied *first*:

https://lore.kernel.org/patchwork/cover/1412450/

Changelog
=

v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
  swapped in, and just immediately fire the minor fault. Let a future CONTINUE
  deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]

Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
  easier, as we no longer have to sift through deltas undoing what we had done
  before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
  helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
  for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
  of some parameters, simplify labels/gotos, ...). [Hugh, Peter]

Overview


See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.

This series is structured as follows:

- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
  helper we've introduced. We rely on the selftest to show that this change
  doesn't break anything.

Use Case


In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.

Additionally, Android folks (Lokesh Gidra ) hope to
optimize the Android Runtime garbage collector using this feature:

"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."

[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] 
https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmus...@google.com/T/#t

Axel Rasmussen (9):
  userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
  userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
  userfaultfd/shmem: support minor fault registration for shmem
  userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
  userfaultfd/selftests: use memfd_create for shmem test type
  userfaultfd/selftests: create alias mappings in the shmem test
  userfaultfd/selftests: reinitialize test context in each test
  userfaultfd/selftests: exercise minor fault handling shmem support
  userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes

 fs/userfaultfd.c |   6 +-
 include/linux/hugetlb.h  |   4 +-
 include/linux/shmem_fs.h |  15 +-
 include/linux/userfaultfd_k.h|   5 +
 include/uapi/linux/userfaultfd.h |   7 +-
 mm/hugetlb.c |   1 +
 mm/memory.c  |   8 +-
 mm/shmem.c   | 112 +++--
 mm/userfaultfd.c | 183 ++-
 tools/testing/selftests/vm/userfaultfd.c | 280 +++
 10 files changed, 377 insertions(+), 244 deletions(-)

--
2.31.1.295.g9ea45b61b8-goog

[PATCH v2 4/4] staging: media: intel-ipu3: remove space before tabs

2021-04-12 Thread Mitali Borkar

Removed unnecessary space before tabs to adhere to  linux kernel coding
style.
Reported by checkpatch.

Signed-off-by: Mitali Borkar 
---
 
Changes from v1:- No changes.

 drivers/staging/media/ipu3/include/intel-ipu3.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/ipu3/include/intel-ipu3.h 
b/drivers/staging/media/ipu3/include/intel-ipu3.h
index 0451f8b7ba4f..340d97160bbb 100644
--- a/drivers/staging/media/ipu3/include/intel-ipu3.h
+++ b/drivers/staging/media/ipu3/include/intel-ipu3.h
@@ -631,7 +631,7 @@ struct ipu3_uapi_bnr_static_config_wb_gains_thr_config {
  * @cg:Gain coefficient for threshold calculation, [0, 31], default 8.
  * @ci:Intensity coefficient for threshold calculation. range [0, 0x1f]
  * default 6.
- * format: u3.2 (3 most significant bits represent whole number,
+ * format: u3.2 (3 most significant bits represent whole number,
  * 2 least significant bits represent the fractional part
  * with each count representing 0.25)
  * e.g. 6 in binary format is 00110, that translates to 1.5
-- 
2.30.2

[PATCH v2 3/4] staging: media: intel-ipu3: reduce length of line

2021-04-12 Thread Mitali Borkar

Reduced length of the line under 80 characters to meet linux-kernel
coding style.

Signed-off-by: Mitali Borkar 
---

Changes from v1:- Reduced length of the line under 80 characters

 drivers/staging/media/ipu3/include/intel-ipu3.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/media/ipu3/include/intel-ipu3.h 
b/drivers/staging/media/ipu3/include/intel-ipu3.h
index 6a72c81d2b67..52dcc6cdcffc 100644
--- a/drivers/staging/media/ipu3/include/intel-ipu3.h
+++ b/drivers/staging/media/ipu3/include/intel-ipu3.h
@@ -247,7 +247,8 @@ struct ipu3_uapi_ae_ccm {
  */
 struct ipu3_uapi_ae_config {
struct ipu3_uapi_ae_grid_config grid_cfg __aligned(32);
-   struct ipu3_uapi_ae_weight_elem weights[IPU3_UAPI_AE_WEIGHTS] 
__aligned(32);
+   struct ipu3_uapi_ae_weight_elem weights[IPU3_UAPI_AE_WEIGHTS]
+   __aligned(32);
struct ipu3_uapi_ae_ccm ae_ccm __aligned(32);
 } __packed;
 
-- 
2.30.2

[PATCH v2 2/4] staging: media: intel-ipu3: reduce length of line

2021-04-12 Thread Mitali Borkar

Reduced length of line as it was exceeding 100 characters by removing
comments from same line and adding it to previous line. This makes code
neater, and meets linux kernel coding style.
Reported by checkpatch.

Signed-off-by: Mitali Borkar 
---
 
Changes from v1:- No changes.

 drivers/staging/media/ipu3/include/intel-ipu3.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/media/ipu3/include/intel-ipu3.h 
b/drivers/staging/media/ipu3/include/intel-ipu3.h
index 335522e7fc08..53f8e5dec8f5 100644
--- a/drivers/staging/media/ipu3/include/intel-ipu3.h
+++ b/drivers/staging/media/ipu3/include/intel-ipu3.h
@@ -10,8 +10,10 @@
 /* from /drivers/staging/media/ipu3/include/videodev2.h */
 
 /* Vendor specific - used for IPU3 camera sub-system */
-#define V4L2_META_FMT_IPU3_PARAMS  v4l2_fourcc('i', 'p', '3', 'p') /* IPU3 
processing parameters */
-#define V4L2_META_FMT_IPU3_STAT_3A v4l2_fourcc('i', 'p', '3', 's') /* IPU3 
3A statistics */
+/* IPU3 processing parameters */
+#define V4L2_META_FMT_IPU3_PARAMS  v4l2_fourcc('i', 'p', '3', 'p')
+/* IPU3 3A statistics */
+#define V4L2_META_FMT_IPU3_STAT_3A v4l2_fourcc('i', 'p', '3', 's')
 
 /* from include/uapi/linux/v4l2-controls.h */
 #define V4L2_CID_INTEL_IPU3_BASE   (V4L2_CID_USER_BASE + 0x10c0)
-- 
2.30.2

[PATCH v2 1/4] staging: media: intel-ipu3: remove unnecessary blank line

2021-04-12 Thread Mitali Borkar

Removed an unnecessary blank line to meet linux kernel coding style.
Reported by checkpatch.pl

Signed-off-by: Mitali Borkar 
---

Changes from v1:- No changes. 

 drivers/staging/media/ipu3/include/intel-ipu3.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/media/ipu3/include/intel-ipu3.h 
b/drivers/staging/media/ipu3/include/intel-ipu3.h
index d95ca9ebfafb..335522e7fc08 100644
--- a/drivers/staging/media/ipu3/include/intel-ipu3.h
+++ b/drivers/staging/media/ipu3/include/intel-ipu3.h
@@ -75,7 +75,6 @@ struct ipu3_uapi_grid_config {
(IPU3_UAPI_AWB_MAX_SETS * \
 (IPU3_UAPI_AWB_SET_SIZE + IPU3_UAPI_AWB_SPARE_FOR_BUBBLES))
 
-
 /**
  * struct ipu3_uapi_awb_raw_buffer - AWB raw buffer
  *
-- 
2.30.2

[PATCH v2 0/4] staging: media: intel-ipu3: Cleanup patchset for style issues

2021-04-12 Thread Mitali Borkar

Changes from v1:-
Dropped patches 1/6 and 2/6 and compiled this as a patchset of 4
patches.
[PATCH 1/4]:- No changes.
[PATCH 2/4]:- No changes.
[PATCH 3/4]:- Reduced length of a line under 80 characters. This was
patch 5/6 previously.
[PATCH 4/4]:- No changes.

Mitali Borkar (4):
  staging: media: intel-ipu3: remove unnecessary blank line
  staging: media: intel-ipu3: reduce length of line
  staging: media: intel-ipu3: reduce length of line
  staging: media: intel-ipu3: remove space before tabs

 .../staging/media/ipu3/include/intel-ipu3.h| 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

-- 
2.30.2

Re: [PATCH v3 3/3] MAINTAINERS: Add Chris Packham as FREESCALE MPC I2C maintainer

2021-04-12 Thread Chris Packham


On 13/04/21 5:09 pm, Chris Packham wrote:
> Add Chris Packham as FREESCALE MPC I2C maintainer.
>
> Signed-off-by: Chris Packham 
Sorry for the duplicate. I had existing output from an earlier 
invocation of git format-patch lying around. "[PATCH v3 4/4] 
MAINTAINERS: ..." is the one I intended to send (although the content is 
the same).
> ---
>   MAINTAINERS | 7 +++
>   1 file changed, 7 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 56e9e4d777d8..3bc77ba8cd05 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7135,6 +7135,13 @@ S: Maintained
>   F:  Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
>   F:  drivers/i2c/busses/i2c-imx-lpi2c.c
>   
> +FREESCALE MPC I2C DRIVER
> +M:   Chris Packham 
> +L:   linux-...@vger.kernel.org
> +S:   Maintained
> +F:   Documentation/devicetree/bindings/i2c/i2c-mpc.yaml
> +F:   drivers/i2c/busses/i2c-mpc.c
> +
>   FREESCALE QORIQ DPAA ETHERNET DRIVER
>   M:  Madalin Bucur 
>   L:  net...@vger.kernel.org

Re: [syzbot] upstream boot error: WARNING in __context_tracking_enter

2021-04-12 Thread Dmitry Vyukov

On Mon, Mar 22, 2021 at 6:22 PM Mark Rutland  wrote:
>
> Hi Russell,
>
> On Fri, Mar 19, 2021 at 10:10:43AM +, Russell King - ARM Linux admin 
> wrote:
> > On Fri, Mar 19, 2021 at 10:54:48AM +0100, Dmitry Vyukov wrote:
> > > .On Fri, Mar 19, 2021 at 10:44 AM syzbot
> > >  wrote:
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit:8b12a62a Merge tag 'drm-fixes-2021-03-19' of 
> > > > git://anongit..
> > > > git tree:   upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e815aed0
> > > > kernel config:  
> > > > https://syzkaller.appspot.com/x/.config?x=cfeed364fc353c32
> > > > dashboard link: 
> > > > https://syzkaller.appspot.com/bug?extid=f09a12b2c77bfbbf51bd
> > > > userspace arch: arm
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the 
> > > > commit:
> > > > Reported-by: syzbot+f09a12b2c77bfbbf5...@syzkaller.appspotmail.com
> > >
> > >
> > > +Mark, arm
> > > It did not get far with CONFIG_CONTEXT_TRACKING_FORCE (kernel doesn't 
> > > boot).
> >
> > It seems that the path:
> >
> > context_tracking_user_enter()
> > user_enter()
> > context_tracking_enter()
> > __context_tracking_enter()
> > vtime_user_enter()
> >
> > expects preemption to be disabled. It effectively is, because local
> > interrupts are disabled by context_tracking_enter().
> >
> > However, the requirement for preemption to be disabled is not
> > documented... so shrug. Maybe someone can say what the real requirements
> > are here.
>
> From dealing with this recently on arm64, theis is a bit messy. To
> handle this robustly we need to do a few things in sequence, including
> using the *_irqoff() variants of the context_tracking_user_*()
> functions.
>
> I wrote down the constraints in commit:
>
>   23529049c6842382 ("arm64: entry: fix non-NMI user<->kernel transitions")
>
> For user->kernel transitions, the arch code needs the following sequence
> before invoking arbitrary kernel C code:
>
> lockdep_hardirqs_off(CALLER_ADDR0);
> user_exit_irqoff();
> trace_hardirqs_off_finish();
>
> For kernel->user transitions, the arch code needs the following sequence
> once it will no longer invoke arbitrary kernel C code, just before
> returning to userspace:
>
> trace_hardirqs_on_prepare();
> lockdep_hardirqs_on_prepare(CALLER_ADDR0);
> user_enter_irqoff();
> lockdep_hardirqs_on(CALLER_ADDR0);

Hi Russell,

Does Mark's comment make sense to you?
lockdep_assert_preemption_disabled() also checks "&&
this_cpu_read(hardirqs_enabled)", so is it that we also need hardirq's
disabled around user_enter/exit?
This issue currently prevents ARM boot on syzbot.

[PATCH v3 4/4] MAINTAINERS: Add Chris Packham as FREESCALE MPC I2C maintainer

2021-04-12 Thread Chris Packham

Add Chris Packham as FREESCALE MPC I2C maintainer.

Signed-off-by: Chris Packham 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 56e9e4d777d8..3bc77ba8cd05 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7135,6 +7135,13 @@ S:   Maintained
 F: Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
 F: drivers/i2c/busses/i2c-imx-lpi2c.c
 
+FREESCALE MPC I2C DRIVER
+M: Chris Packham 
+L: linux-...@vger.kernel.org
+S: Maintained
+F: Documentation/devicetree/bindings/i2c/i2c-mpc.yaml
+F: drivers/i2c/busses/i2c-mpc.c
+
 FREESCALE QORIQ DPAA ETHERNET DRIVER
 M: Madalin Bucur 
 L: net...@vger.kernel.org
-- 
2.31.1

[PATCH v3 1/4] i2c: mpc: use device managed APIs

2021-04-12 Thread Chris Packham

Use device managed functions an clean up error handling.

Signed-off-by: Chris Packham 
Signed-off-by: Wolfram Sang 
---

Notes:
Changes in v3:
- Assuming 09aab7add7bf is reverted I've folded in the fix from Wei
  Yongjun[1] into the original patch. If Wei's patch is applied on top
  of whats already in i2c/for-next then this patch can be ignored.

[1] - 
https://lore.kernel.org/linux-i2c/20210412160026.194423-1-weiyongj...@huawei.com/

 drivers/i2c/busses/i2c-mpc.c | 52 +---
 1 file changed, 18 insertions(+), 34 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index 5b746a898e8e..6e5614acebac 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -654,7 +654,6 @@ static int fsl_i2c_probe(struct platform_device *op)
u32 clock = MPC_I2C_CLOCK_LEGACY;
int result = 0;
int plen;
-   struct resource res;
struct clk *clk;
int err;
 
@@ -662,7 +661,7 @@ static int fsl_i2c_probe(struct platform_device *op)
if (!match)
return -EINVAL;
 
-   i2c = kzalloc(sizeof(*i2c), GFP_KERNEL);
+   i2c = devm_kzalloc(>dev, sizeof(*i2c), GFP_KERNEL);
if (!i2c)
return -ENOMEM;
 
@@ -670,24 +669,21 @@ static int fsl_i2c_probe(struct platform_device *op)
 
init_waitqueue_head(>queue);
 
-   i2c->base = of_iomap(op->dev.of_node, 0);
-   if (!i2c->base) {
+   i2c->base = devm_platform_ioremap_resource(op, 0);
+   if (IS_ERR(i2c->base)) {
dev_err(i2c->dev, "failed to map controller\n");
-   result = -ENOMEM;
-   goto fail_map;
+   return PTR_ERR(i2c->base);
}
 
-   i2c->irq = irq_of_parse_and_map(op->dev.of_node, 0);
-   if (i2c->irq < 0) {
-   result = i2c->irq;
-   goto fail_map;
-   }
+   i2c->irq = platform_get_irq(op, 0);
+   if (i2c->irq < 0)
+   return i2c->irq;
 
-   result = request_irq(i2c->irq, mpc_i2c_isr,
+   result = devm_request_irq(>dev, i2c->irq, mpc_i2c_isr,
IRQF_SHARED, "i2c-mpc", i2c);
if (result < 0) {
dev_err(i2c->dev, "failed to attach interrupt\n");
-   goto fail_request;
+   return result;
}
 
/*
@@ -699,7 +695,7 @@ static int fsl_i2c_probe(struct platform_device *op)
err = clk_prepare_enable(clk);
if (err) {
dev_err(>dev, "failed to enable clock\n");
-   goto fail_request;
+   return err;
} else {
i2c->clk_per = clk;
}
@@ -731,32 +727,26 @@ static int fsl_i2c_probe(struct platform_device *op)
}
dev_info(i2c->dev, "timeout %u us\n", mpc_ops.timeout * 100 / HZ);
 
-   platform_set_drvdata(op, i2c);
-
i2c->adap = mpc_ops;
-   of_address_to_resource(op->dev.of_node, 0, );
scnprintf(i2c->adap.name, sizeof(i2c->adap.name),
- "MPC adapter at 0x%llx", (unsigned long long)res.start);
-   i2c_set_adapdata(>adap, i2c);
+ "MPC adapter (%s)", of_node_full_name(op->dev.of_node));
i2c->adap.dev.parent = >dev;
+   i2c->adap.nr = op->id;
i2c->adap.dev.of_node = of_node_get(op->dev.of_node);
i2c->adap.bus_recovery_info = _i2c_recovery_info;
+   platform_set_drvdata(op, i2c);
+   i2c_set_adapdata(>adap, i2c);
 
-   result = i2c_add_adapter(>adap);
-   if (result < 0)
+   result = i2c_add_numbered_adapter(>adap);
+   if (result)
goto fail_add;
 
-   return result;
+   return 0;
 
  fail_add:
if (i2c->clk_per)
clk_disable_unprepare(i2c->clk_per);
-   free_irq(i2c->irq, i2c);
- fail_request:
-   irq_dispose_mapping(i2c->irq);
-   iounmap(i2c->base);
- fail_map:
-   kfree(i2c);
+
return result;
 };
 
@@ -769,12 +759,6 @@ static int fsl_i2c_remove(struct platform_device *op)
if (i2c->clk_per)
clk_disable_unprepare(i2c->clk_per);
 
-   if (i2c->irq)
-   free_irq(i2c->irq, i2c);
-
-   irq_dispose_mapping(i2c->irq);
-   iounmap(i2c->base);
-   kfree(i2c);
return 0;
 };
 
-- 
2.31.1

[PATCH v3 3/4] i2c: mpc: Remove redundant NULL check

2021-04-12 Thread Chris Packham

In mpc_i2c_get_fdr_8xxx div is assigned as we iterate through the
mpc_i2c_dividers_8xxx array. By the time we exit the loop div will
either have the value that matches the requested speed or be pointing at
the last entry in mpc_i2c_dividers_8xxx. Checking for div being NULL
after the loop is redundant so remove the check.

Reported-by: Wolfram Sang 
Signed-off-by: Chris Packham 
---
 drivers/i2c/busses/i2c-mpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index 9818f9f6a553..c30687483147 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -377,7 +377,7 @@ static int mpc_i2c_get_fdr_8xxx(struct device_node *node, 
u32 clock,
}
 
*real_clk = fsl_get_sys_freq() / prescaler / div->divider;
-   return div ? (int)div->fdr : -EINVAL;
+   return (int)div->fdr;
 }
 
 static void mpc_i2c_setup_8xxx(struct device_node *node,
-- 
2.31.1

[PATCH v3 0/4] i2c: mpc: Refactor to improve responsiveness

2021-04-12 Thread Chris Packham

This is an update to what is already in i2c/for-next. I've included "i2c: mpc:
use device managed APIs" which had some problems in the remove code path which
Wei Yongjun kindly pointed out with a fix. I've incorporated those changes into
this version in case the original is reverted.

I've tested on T2081 and P2041 based systems with a number of i2c and smbus
devices. Also this time I included a few iterations of module insert/remove
which would have caught the earlier errors.

Chris Packham (4):
  i2c: mpc: use device managed APIs
  i2c: mpc: Interrupt driven transfer
  i2c: mpc: Remove redundant NULL check
  MAINTAINERS: Add Chris Packham as FREESCALE MPC I2C maintainer

 MAINTAINERS  |   7 +
 drivers/i2c/busses/i2c-mpc.c | 488 +++
 2 files changed, 267 insertions(+), 228 deletions(-)

-- 
2.31.1

[PATCH v3 2/4] i2c: mpc: Interrupt driven transfer

2021-04-12 Thread Chris Packham

The fsl-i2c controller will generate an interrupt after every byte
transferred. Make use of this interrupt to drive a state machine which
allows the next part of a transfer to happen as soon as the interrupt is
received. This is particularly helpful with SMBUS devices like the LM81
which will timeout if we take too long between bytes in a transfer.

Signed-off-by: Chris Packham 
---

Notes:
Changes in v3:
- use WARN/WARN_ON instead of BUG/BUG_ON
Changes in v2:
- add static_assert for state debug strings
- remove superfluous space

 drivers/i2c/busses/i2c-mpc.c | 434 +++
 1 file changed, 241 insertions(+), 193 deletions(-)

diff --git a/drivers/i2c/busses/i2c-mpc.c b/drivers/i2c/busses/i2c-mpc.c
index 6e5614acebac..9818f9f6a553 100644
--- a/drivers/i2c/busses/i2c-mpc.c
+++ b/drivers/i2c/busses/i2c-mpc.c
@@ -1,16 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
- * (C) Copyright 2003-2004
- * Humboldt Solutions Ltd, adr...@humboldt.co.uk.
-
  * This is a combined i2c adapter and algorithm driver for the
  * MPC107/Tsi107 PowerPC northbridge and processors that include
  * the same I2C unit (8240, 8245, 85xx).
  *
- * Release 0.8
- *
- * This file is licensed under the terms of the GNU General Public
- * License version 2. This program is licensed "as is" without any
- * warranty of any kind, whether express or implied.
+ * Copyright (C) 2003-2004 Humboldt Solutions Ltd, adr...@humboldt.co.uk
+ * Copyright (C) 2021 Allied Telesis Labs
  */
 
 #include 
@@ -58,11 +53,36 @@
 #define CSR_MIF  0x02
 #define CSR_RXAK 0x01
 
+enum mpc_i2c_action {
+   MPC_I2C_ACTION_INVALID = 0,
+   MPC_I2C_ACTION_START,
+   MPC_I2C_ACTION_RESTART,
+   MPC_I2C_ACTION_READ_BEGIN,
+   MPC_I2C_ACTION_READ_BYTE,
+   MPC_I2C_ACTION_WRITE,
+   MPC_I2C_ACTION_STOP,
+
+   __MPC_I2C_ACTION_CNT
+};
+
+static char *action_str[] = {
+   "invalid",
+   "start",
+   "restart",
+   "read begin",
+   "read",
+   "write",
+   "stop",
+};
+
+static_assert(ARRAY_SIZE(action_str) == __MPC_I2C_ACTION_CNT);
+
 struct mpc_i2c {
struct device *dev;
void __iomem *base;
u32 interrupt;
-   wait_queue_head_t queue;
+   wait_queue_head_t waitq;
+   spinlock_t lock;
struct i2c_adapter adap;
int irq;
u32 real_clk;
@@ -70,6 +90,16 @@ struct mpc_i2c {
u8 fdr, dfsrr;
 #endif
struct clk *clk_per;
+   u32 cntl_bits;
+   enum mpc_i2c_action action;
+   struct i2c_msg *msgs;
+   int num_msgs;
+   int curr_msg;
+   u32 byte_posn;
+   u32 block;
+   int rc;
+   int expect_rxack;
+
 };
 
 struct mpc_i2c_divider {
@@ -86,19 +116,6 @@ static inline void writeccr(struct mpc_i2c *i2c, u32 x)
writeb(x, i2c->base + MPC_I2C_CR);
 }
 
-static irqreturn_t mpc_i2c_isr(int irq, void *dev_id)
-{
-   struct mpc_i2c *i2c = dev_id;
-   if (readb(i2c->base + MPC_I2C_SR) & CSR_MIF) {
-   /* Read again to allow register to stabilise */
-   i2c->interrupt = readb(i2c->base + MPC_I2C_SR);
-   writeb(0, i2c->base + MPC_I2C_SR);
-   wake_up(>queue);
-   return IRQ_HANDLED;
-   }
-   return IRQ_NONE;
-}
-
 /* Sometimes 9th clock pulse isn't generated, and slave doesn't release
  * the bus, because it wants to send ACK.
  * Following sequence of enabling/disabling and sending start/stop generates
@@ -121,45 +138,6 @@ static void mpc_i2c_fixup(struct mpc_i2c *i2c)
}
 }
 
-static int i2c_wait(struct mpc_i2c *i2c, unsigned timeout, int writing)
-{
-   u32 cmd_err;
-   int result;
-
-   result = wait_event_timeout(i2c->queue,
-   (i2c->interrupt & CSR_MIF), timeout);
-
-   if (unlikely(!(i2c->interrupt & CSR_MIF))) {
-   dev_dbg(i2c->dev, "wait timeout\n");
-   writeccr(i2c, 0);
-   result = -ETIMEDOUT;
-   }
-
-   cmd_err = i2c->interrupt;
-   i2c->interrupt = 0;
-
-   if (result < 0)
-   return result;
-
-   if (!(cmd_err & CSR_MCF)) {
-   dev_dbg(i2c->dev, "unfinished\n");
-   return -EIO;
-   }
-
-   if (cmd_err & CSR_MAL) {
-   dev_dbg(i2c->dev, "MAL\n");
-   return -EAGAIN;
-   }
-
-   if (writing && (cmd_err & CSR_RXAK)) {
-   dev_dbg(i2c->dev, "No RXAK\n");
-   /* generate stop */
-   writeccr(i2c, CCR_MEN);
-   return -ENXIO;
-   }
-   return 0;
-}
-
 #if defined(CONFIG_PPC_MPC52xx) || defined(CONFIG_PPC_MPC512x)
 static const struct mpc_i2c_divider mpc_i2c_dividers_52xx[] = {
{20, 0x20}, {22, 0x21}, {24, 0x22}, {26, 0x23},
@@ -434,168 +412,209 @@ static void mpc_i2c_setup_8xxx(struct device_node *node,
 }
 #endif /* CONFIG_FSL_SOC */
 
-static void mpc_i2c_start(struct mpc_i2c *i2c)
+static void mpc_i2c_finish(struct mpc_i2c *i2c, int rc)
 {
-

[PATCH v3 3/3] MAINTAINERS: Add Chris Packham as FREESCALE MPC I2C maintainer

2021-04-12 Thread Chris Packham

Add Chris Packham as FREESCALE MPC I2C maintainer.

Signed-off-by: Chris Packham 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 56e9e4d777d8..3bc77ba8cd05 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7135,6 +7135,13 @@ S:   Maintained
 F: Documentation/devicetree/bindings/i2c/i2c-imx-lpi2c.yaml
 F: drivers/i2c/busses/i2c-imx-lpi2c.c
 
+FREESCALE MPC I2C DRIVER
+M: Chris Packham 
+L: linux-...@vger.kernel.org
+S: Maintained
+F: Documentation/devicetree/bindings/i2c/i2c-mpc.yaml
+F: drivers/i2c/busses/i2c-mpc.c
+
 FREESCALE QORIQ DPAA ETHERNET DRIVER
 M: Madalin Bucur 
 L: net...@vger.kernel.org
-- 
2.31.1

Re: [PATCH 0/3] scsi: mptfusion: Clear the warnings indicating that the variable is not used

2021-04-12 Thread Martin K. Petersen



Zhen,

> Zhen Lei (3):
>   scsi: mptfusion: Remove unused local variable 'time_count'
>   scsi: mptfusion: Remove unused local variable 'port'
>   scsi: mptfusion: Fix error return code of mptctl_hp_hostinfo()

I applied patches 1+2. I hesitate making functional changes to such an
old driver.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH net-next v3 1/1] net: stmmac: Add support for external trigger timestamping

2021-04-12 Thread Wong Vee Khee

On Sun, Apr 11, 2021 at 08:10:55AM -0700, Richard Cochran wrote:
> On Sun, Apr 11, 2021 at 10:40:28AM +0800, Wong Vee Khee wrote:
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c 
> > b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
> > index 60566598d644..60e17fd24aba 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
> > @@ -296,6 +296,13 @@ static int intel_crosststamp(ktime_t *device,
> >  
> > intel_priv = priv->plat->bsp_priv;
> >  
> > +   /* Both internal crosstimestamping and external triggered event
> > +* timestamping cannot be run concurrently.
> > +*/
> > +   if (priv->plat->ext_snapshot_en)
> > +   return -EBUSY;
> > +
> > +   mutex_lock(>aux_ts_lock);
> 
> Lock, then ...
> 
> > /* Enable Internal snapshot trigger */
> > acr_value = readl(ptpaddr + PTP_ACR);
> > acr_value &= ~PTP_ACR_MASK;
> > @@ -321,6 +328,7 @@ static int intel_crosststamp(ktime_t *device,
> > acr_value = readl(ptpaddr + PTP_ACR);
> > acr_value |= PTP_ACR_ATSFC;
> > writel(acr_value, ptpaddr + PTP_ACR);
> > +   mutex_unlock(>aux_ts_lock);
> 
> unlock, then ...
>   
> > /* Trigger Internal snapshot signal
> >  * Create a rising edge by just toggle the GPO1 to low
> > @@ -355,6 +363,8 @@ static int intel_crosststamp(ktime_t *device,
> > *system = convert_art_to_tsc(art_time);
> > }
> >  
> > +   /* Release the mutex */
> > +   mutex_unlock(>aux_ts_lock);
> 
> unlock again?  Huh?
>

Nice catch. Fix in v4.
 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > index c49debb62b05..abadcd8cdc41 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
> > @@ -239,6 +239,9 @@ struct stmmac_priv {
> > int use_riwt;
> > int irq_wake;
> > spinlock_t ptp_lock;
> > +   /* Mutex lock for Auxiliary Snapshots */
> > +   struct mutex aux_ts_lock;
> 
> In the comment, please be specific about which data are protected.
> For example:
> 
>   /* Protects auxiliary snapshot registers from concurrent access. */
> 
> > @@ -163,6 +166,43 @@ static void get_ptptime(void __iomem *ptpaddr, u64 
> > *ptp_time)
> > *ptp_time = ns;
> >  }
> >  
> > +static void timestamp_interrupt(struct stmmac_priv *priv)
> > +{
> > +   struct ptp_clock_event event;
> > +   unsigned long flags;
> > +   u32 num_snapshot;
> > +   u32 ts_status;
> > +   u32 tsync_int;
> 
> Please group same types together (u32) in a one-line list.
> 
> > +   u64 ptp_time;
> > +   int i;
> > +
> > +   tsync_int = readl(priv->ioaddr + GMAC_INT_STATUS) & GMAC_INT_TSIE;
> > +
> > +   if (!tsync_int)
> > +   return;
> > +
> > +   /* Read timestamp status to clear interrupt from either external
> > +* timestamp or start/end of PPS.
> > +*/
> > +   ts_status = readl(priv->ioaddr + GMAC_TIMESTAMP_STATUS);
> 
> Reading this register has a side effect of clearing status?  If so,
> doesn't it need protection against concurrent access?
> 
> The function, intel_crosststamp() also reads this bit.
>

The following check is introduced in intel_crosststamp() to avoid this:

/* Both internal crosstimestamping and external triggered event
 * timestamping cannot be run concurrently.
 */
 if (priv->plat->ext_snapshot_en)
return -EBUSY;

 
> > +   if (!priv->plat->ext_snapshot_en)
> > +   return;
> 
> Doesn't this test come too late?  Setting ts_status just cleared the
> bit used by the other code path.
>

As per Synopsys's design, all bits except Bits[27:25] gets cleared when
this register is read.

> > +   num_snapshot = (ts_status & GMAC_TIMESTAMP_ATSNS_MASK) >>
> > +  GMAC_TIMESTAMP_ATSNS_SHIFT;
> > +
> > +   for (i = 0; i < num_snapshot; i++) {
> > +   spin_lock_irqsave(>ptp_lock, flags);
> > +   get_ptptime(priv->ptpaddr, _time);
> > +   spin_unlock_irqrestore(>ptp_lock, flags);
> > +   event.type = PTP_CLOCK_EXTTS;
> > +   event.index = 0;
> > +   event.timestamp = ptp_time;
> > +   ptp_clock_event(priv->ptp_clock, );
> > +   }
> > +}
> > +
> 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
> > index b164ae22e35f..d668c33a0746 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c
> > @@ -135,9 +135,13 @@ static int stmmac_enable(struct ptp_clock_info *ptp,
> >  {
> > struct stmmac_priv *priv =
> > container_of(ptp, struct stmmac_priv, ptp_clock_ops);
> > +   void __iomem *ptpaddr = priv->ptpaddr;
> > +   void __iomem *ioaddr = priv->hw->pcsr;
> > struct stmmac_pps_cfg *cfg;
> > int ret = -EOPNOTSUPP;
> > unsigned long flags;
> > +   u32 intr_value;
> > +   u32 acr_value;
> 
> Please group same types together.
>

Fix in v4.
 
>

Re: [PATCH 5.10 000/188] 5.10.30-rc1 review

2021-04-12 Thread Naresh Kamboju

On Mon, 12 Apr 2021 at 14:23, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 5.10.30 release.
> There are 188 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 14 Apr 2021 08:39:44 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.30-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.10.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h



Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

## Build
* kernel: 5.10.30-rc1
* git: 
['https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git',
'https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc']
* git branch: linux-5.10.y
* git commit: 8ac4b1deedaa507b5d0f46316e7f32004dd99cd1
* git describe: v5.10.29-189-g8ac4b1deedaa
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10.29-189-g8ac4b1deedaa

## No regressions (compared to v5.10.28-42-g18f507c37f33)

## No fixes (compared to v5.10.28-42-g18f507c37f33)

## Test result summary
 total: 75890, pass: 63301, fail: 2174, skip: 10129, xfail: 286,

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 192 total, 192 passed, 0 failed
* arm64: 26 total, 26 passed, 0 failed
* dragonboard-410c: 1 total, 1 passed, 0 failed
* hi6220-hikey: 1 total, 1 passed, 0 failed
* i386: 26 total, 25 passed, 1 failed
* juno-r2: 1 total, 1 passed, 0 failed
* mips: 45 total, 45 passed, 0 failed
* parisc: 9 total, 9 passed, 0 failed
* powerpc: 27 total, 27 passed, 0 failed
* riscv: 21 total, 21 passed, 0 failed
* s390: 18 total, 18 passed, 0 failed
* sh: 18 total, 18 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x15: 2 total, 1 passed, 1 failed
* x86: 1 total, 1 passed, 0 failed
* x86_64: 26 total, 26 passed, 0 failed

## Test suites summary
* fwts
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-
* kselftest-x86
* kselftest-zram
* kunit
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

RE: [PATCH v14 12/12] dmaengine: imx-sdma: add terminated list for freed descriptor in worker

2021-04-12 Thread Robin Gong

On 2021/04/12 17:39,  Vinod Koul   wrote: 
> On 07-04-21, 23:30, Robin Gong wrote:
> > Add terminated list for keeping descriptor so that it could be freed
> > in worker without any potential involving next descriptor raised up
> > before this descriptor freed, because vchan_get_all_descriptors get
> > all descriptors including the last terminated descriptor and the next
> > descriptor, hence, the next descriptor maybe freed unexpectly when
> > it's done in worker without this patch.
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> >
> spinics.net%2Flists%2Fdmaengine%2Fmsg23367.htmldata=04%7C01%
> 7Cyib
> >
> in.gong%40nxp.com%7Cf255f329c8de459ffbaf08d8fd96d6c5%7C686ea1d3bc
> 2b4c6
> >
> fa92cd99c5c301635%7C0%7C0%7C637538171591949442%7CUnknown%7CT
> WFpbGZsb3d
> >
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C
> >
> 1000sdata=3YFgzHFDNwRnogvxZpNcwVKOKpk4GHrgScdrbuMKjwE%3D
> rese
> > rved=0
> 
> Sound like you should implement .device_synchronize() and do the actual
> work there..?
Yes,  I believe no issue here if call dmaengine_terminate_sync() always since
flush_work(>terminate_worker) has already been in  .device_synchronize() 
of
sdma driver. But unfortunately, have to use dmaengine_terminate_all() instead 
in some
non-atomic case like ALSA.

Re: [PATCH] kunit: add unit test for filtering suites by names

2021-04-12 Thread David Gow

On Tue, Apr 13, 2021 at 8:08 AM Daniel Latypov  wrote:
>
> This adds unit tests for kunit_filter_subsuite() and
> kunit_filter_suites().
>
> Note: what the executor means by "subsuite" is the array of suites
> corresponding to each test file.
>
> This patch lightly refactors executor.c to avoid the use of global
> variables to make it testable.
> It also includes a clever `kfree_at_end()` helper that makes this test
> easier to write than it otherwise would have been.
>
> Tested by running just the new tests using itself
> $ ./tools/testing/kunit/kunit.py run '*exec*'
>
> Signed-off-by: Daniel Latypov 

I really like this test, thanks.

A few small notes below, including what I think is a missing
kfree_at_end() call.

Assuming that one issue is fixed (or I'm mistaken):
Reviewed-by: David Gow 

-- David

> ---
>  lib/kunit/executor.c  |  26 
>  lib/kunit/executor_test.c | 132 ++
>  2 files changed, 147 insertions(+), 11 deletions(-)
>  create mode 100644 lib/kunit/executor_test.c
>
> diff --git a/lib/kunit/executor.c b/lib/kunit/executor.c
> index 15832ed44668..96a4ae983786 100644
> --- a/lib/kunit/executor.c
> +++ b/lib/kunit/executor.c
> @@ -19,7 +19,7 @@ MODULE_PARM_DESC(filter_glob,
> "Filter which KUnit test suites run at boot-time, e.g. 
> list*");
>
>  static struct kunit_suite * const *
> -kunit_filter_subsuite(struct kunit_suite * const * const subsuite)
> +kunit_filter_subsuite(struct kunit_suite * const * const subsuite, const 
> char *filter_glob)
>  {
> int i, n = 0;
> struct kunit_suite **filtered;
> @@ -52,19 +52,14 @@ struct suite_set {
> struct kunit_suite * const * const *end;
>  };
>
> -static struct suite_set kunit_filter_suites(void)
> +static struct suite_set kunit_filter_suites(const struct suite_set 
> *suite_set,
> +   const char *filter_glob)
>  {
> int i;
> struct kunit_suite * const **copy, * const *filtered_subsuite;
> struct suite_set filtered;
>
> -   const size_t max = __kunit_suites_end - __kunit_suites_start;
> -
> -   if (!filter_glob) {
> -   filtered.start = __kunit_suites_start;
> -   filtered.end = __kunit_suites_end;
> -   return filtered;
> -   }
> +   const size_t max = suite_set->end - suite_set->start;
>
> copy = kmalloc_array(max, sizeof(*filtered.start), GFP_KERNEL);
> filtered.start = copy;
> @@ -74,7 +69,7 @@ static struct suite_set kunit_filter_suites(void)
> }
>
> for (i = 0; i < max; ++i) {
> -   filtered_subsuite = 
> kunit_filter_subsuite(__kunit_suites_start[i]);
> +   filtered_subsuite = 
> kunit_filter_subsuite(suite_set->start[i], filter_glob);
> if (filtered_subsuite)
> *copy++ = filtered_subsuite;
> }
> @@ -98,8 +93,13 @@ static void kunit_print_tap_header(struct suite_set 
> *suite_set)
>  int kunit_run_all_tests(void)
>  {
> struct kunit_suite * const * const *suites;
> +   struct suite_set suite_set = {
> +   .start = __kunit_suites_start,
> +   .end = __kunit_suites_end,
> +   };
>
> -   struct suite_set suite_set = kunit_filter_suites();
> +   if (filter_glob)
> +   suite_set = kunit_filter_suites(_set, filter_glob);
>
> kunit_print_tap_header(_set);
>
> @@ -115,4 +115,8 @@ int kunit_run_all_tests(void)
> return 0;
>  }
>
> +#if IS_BUILTIN(CONFIG_KUNIT_TEST)
> +#include "executor_test.c"
> +#endif
> +
>  #endif /* IS_BUILTIN(CONFIG_KUNIT) */
> diff --git a/lib/kunit/executor_test.c b/lib/kunit/executor_test.c
> new file mode 100644
> index ..8e925395beeb
> --- /dev/null
> +++ b/lib/kunit/executor_test.c
> @@ -0,0 +1,132 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KUnit test for the KUnit executor.
> + *
> + * Copyright (C) 2021, Google LLC.
> + * Author: Daniel Latypov 
> + */
> +
> +#include 
> +
> +static void kfree_at_end(struct kunit *test, const void *to_free);
> +static struct kunit_suite *alloc_fake_suite(struct kunit *test,
> +   const char *suite_name);
> +
> +static void filter_subsuite_test(struct kunit *test)
> +{
> +   struct kunit_suite *subsuite[3] = {NULL, NULL, NULL};
> +   struct kunit_suite * const *filtered;
> +
> +   subsuite[0] = alloc_fake_suite(test, "suite1");
> +   subsuite[1] = alloc_fake_suite(test, "suite2");
> +
> +   /* Want: suite1, suite2, NULL -> suite2, NULL */
> +   filtered = kunit_filter_subsuite(subsuite, "suite2*");
> +   KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filtered);
> +   kfree_at_end(test, filtered);
> +
> +   KUNIT_ASSERT_NOT_ERR_OR_NULL(test, filtered[0]);
> +   KUNIT_EXPECT_STREQ(test, (const char *)filtered[0]->name, "suite2");

Is it worth testing that filtered[0] == subsuite[1], not just the
name? (I suspect it doesn't

Re: [RESEND,v5,1/2] bio: limit bio max size

2021-04-12 Thread Changheun Lee

> On Sun, Apr 11, 2021 at 10:13:01PM +, Damien Le Moal wrote:
> > On 2021/04/09 23:47, Bart Van Assche wrote:
> > > On 4/7/21 3:27 AM, Damien Le Moal wrote:
> > >> On 2021/04/07 18:46, Changheun Lee wrote:
> > >>> I'll prepare new patch as you recommand. It will be added setting of
> > >>> limit_bio_size automatically when queue max sectors is determined.
> > >>
> > >> Please do that in the driver for the HW that benefits from it. Do not do 
> > >> this
> > >> for all block devices.
> > > 
> > > Hmm ... is it ever useful to build a bio with a size that exceeds 
> > > max_hw_sectors when submitting a bio directly to a block device, or in 
> > > other words, if no stacked block driver sits between the submitter and 
> > > the block device? Am I perhaps missing something?
> > 
> > Device performance wise, the benefits are certainly not obvious to me 
> > either.
> > But for very fast block devices, I think the CPU overhead of building more
> > smaller BIOs may be significant compared to splitting a large BIO into 
> > multiple
> > requests. Though it may be good to revisit this with some benchmark numbers.
> 
> This patch tries to address issue[1] in do_direct_IO() in which
> Changheun observed that other operations takes time between adding page
> to bio.
> 
> However, do_direct_IO() just does following except for adding bio and
> submitting bio:
> 
> - retrieves pages at batch(pin 64 pages each time from VM) and 
> 
> - retrieve block mapping(get_more_blocks), which is still done usually
> very less times for 32MB; for new mapping, clean_bdev_aliases() may
> take a bit time.
> 
> If there isn't system memory pressure, pin 64 pages won't be slow, but
> get_more_blocks() may take a bit time.
> 
> Changheun, can you check if multiple get_more_blocks() is called for 
> submitting
> 32MB in your test?

almost one time called.

> 
> In my 32MB sync dio f2fs test on x86_64 VM, one buffer_head mapping can
> hold 32MB, but it is one freshly new f2fs.
> 
> I'd suggest to understand the issue completely before figuring out one
> solution.

Thank you for your advice. I'll analyze more about your point later. :)
But I think it's different from finding main time spend point in
do_direct_IO(). I think excessive loop should be controlled.
8,192 loops in do_direct_IO() - for 32MB - to submit one bio is too much
on 4KB page system. I want to apply a optional solution to avoid
excessive loop casued by multipage bvec.

Thanks,

Changheun Lee

Re: [PATCH][next] scsi: aacraid: Replace one-element array with flexible-array member

2021-04-12 Thread Martin K. Petersen



Hi Kees/Gustavo!

>> @@ -4020,7 +4020,8 @@ static int aac_convert_sgraw2(struct aac_raw_io2 
>> *rio2, int pages, int nseg, int
>>  }
>>  }
>>  sge[pos] = rio2->sge[nseg-1];
>> -memcpy(>sge[1], [1], (nseg_new-1)*sizeof(struct 
>> sge_ieee1212));
>> +memcpy(>sge[1], [1],
>> +   flex_array_size(rio2, sge, nseg_new - 1));
>
> This was hard to validate, 

... which is why I didn't apply this patch. I don't like changes which
make the reader have to jump through hoops to figure out what the code
actually does. I find the original much easier to understand.

Silencing analyzer warnings shouldn't be done at the expense of human
readers. If it is imperative to switch to flex_array_size() to quiesce
checker warnings, please add a comment in the code explaining that the
size evaluates to nseg_new-1 sge_ieee1212 structs.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 5.11 000/210] 5.11.14-rc1 review

2021-04-12 Thread Naresh Kamboju

On Mon, 12 Apr 2021 at 14:32, Greg Kroah-Hartman
 wrote:
>
> This is the start of the stable review cycle for the 5.11.14 release.
> There are 210 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed, 14 Apr 2021 08:39:44 +.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.14-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.11.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Tested-by: Linux Kernel Functional Testing 

## Build
* kernel: 5.11.14-rc1
* git: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
* git branch: linux-5.11.y
* git commit: 7ce240e32fd44eb0ababbd16236a00ca7b7d005e
* git describe: v5.11.13-211-g7ce240e32fd4
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.11.y/build/v5.11.13-211-g7ce240e32fd4

## No regressions (compared to v5.11.12-46-gab8c60637a48)

## No fixes (compared to v5.11.12-46-gab8c60637a48)

## Test result summary
 total: 74763, pass: 62936, fail: 1599, skip: 9939, xfail: 289,

## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 192 total, 192 passed, 0 failed
* arm64: 26 total, 26 passed, 0 failed
* dragonboard-410c: 1 total, 1 passed, 0 failed
* hi6220-hikey: 1 total, 1 passed, 0 failed
* i386: 26 total, 25 passed, 1 failed
* juno-r2: 1 total, 1 passed, 0 failed
* mips: 45 total, 45 passed, 0 failed
* parisc: 9 total, 9 passed, 0 failed
* powerpc: 27 total, 27 passed, 0 failed
* riscv: 21 total, 21 passed, 0 failed
* s390: 18 total, 18 passed, 0 failed
* sh: 18 total, 18 passed, 0 failed
* sparc: 9 total, 9 passed, 0 failed
* x15: 1 total, 0 passed, 1 failed
* x86: 1 total, 1 passed, 0 failed
* x86_64: 26 total, 26 passed, 0 failed

## Test suites summary
* fwts
* igt-gpu-tools
* install-android-platform-tools-r2600
* kselftest-
* kselftest-android
* kselftest-bpf
* kselftest-capabilities
* kselftest-cgroup
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-cpufreq
* kselftest-efivarfs
* kselftest-filesystems
* kselftest-firmware
* kselftest-fpu
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-kvm
* kselftest-lib
* kselftest-livepatch
* kselftest-lkdtm
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-vm
* kselftest-vsyscall-mode-native-
* kselftest-vsyscall-mode-none-
* kselftest-x86
* kselftest-zram
* kunit
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-controllers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* ltp-tracing-tests
* network-basic-tests
* perf
* rcutorture
* ssuite
* v4l2-compliance

--
Linaro LKFT
https://lkft.linaro.org

[PATCH v2 2/2] perf/core: Support reading group events with shared cgroups

2021-04-12 Thread Namhyung Kim

This enables reading event group's counter values together with a
PERF_EVENT_IOC_READ_CGROUP command like we do in the regular read().
Users should give a correct size of buffer to be read which includes
the total buffer size and the cgroup id.

Acked-by: Song Liu 
Signed-off-by: Namhyung Kim 
---
 kernel/events/core.c | 120 +--
 1 file changed, 117 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0c6b3848a61f..d483b4b42fe2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2232,13 +2232,24 @@ static void perf_add_cgrp_node_list(struct perf_event 
*event,
 {
struct list_head *cgrp_ctx_list = this_cpu_ptr(_ctx_list);
struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+   struct perf_event *sibling;
bool is_first;
 
lockdep_assert_irqs_disabled();
lockdep_assert_held(>lock);
 
+   /* only group leader can be added directly */
+   if (event->group_leader != event)
+   return;
+
+   if (!event_has_cgroup_node(event))
+   return;
+
is_first = list_empty(>cgrp_node_list);
+
list_add_tail(>cgrp_node_entry, >cgrp_node_list);
+   for_each_sibling_event(sibling, event)
+   list_add_tail(>cgrp_node_entry, >cgrp_node_list);
 
if (is_first)
list_add_tail(>cgrp_ctx_entry, cgrp_ctx_list);
@@ -2250,15 +2261,25 @@ static void perf_del_cgrp_node_list(struct perf_event 
*event,
struct perf_event_context *ctx)
 {
struct perf_cgroup *cgrp = perf_cgroup_from_task(current, ctx);
+   struct perf_event *sibling;
 
lockdep_assert_irqs_disabled();
lockdep_assert_held(>lock);
 
+   /* only group leader can be deleted directly */
+   if (event->group_leader != event)
+   return;
+
+   if (!event_has_cgroup_node(event))
+   return;
+
update_cgroup_node(event, cgrp->css.cgroup);
/* to refresh delta when it's enabled */
event->cgrp_node_count = 0;
 
list_del(>cgrp_node_entry);
+   for_each_sibling_event(sibling, event)
+   list_del(>cgrp_node_entry);
 
if (list_empty(>cgrp_node_list))
list_del(>cgrp_ctx_entry);
@@ -2333,7 +2354,7 @@ static int perf_event_attach_cgroup_node(struct 
perf_event *event, u64 nr_cgrps,
 
raw_spin_unlock_irqrestore(>lock, flags);
 
-   if (is_first && enabled)
+   if (is_first && enabled && event->group_leader == event)
event_function_call(event, perf_attach_cgroup_node, NULL);
 
return 0;
@@ -2370,8 +2391,8 @@ static void __perf_read_cgroup_node(struct perf_event 
*event)
}
 }
 
-static int perf_event_read_cgroup_node(struct perf_event *event, u64 read_size,
-  u64 cgrp_id, char __user *buf)
+static int perf_event_read_cgrp_node_one(struct perf_event *event, u64 cgrp_id,
+char __user *buf)
 {
struct perf_cgroup_node *cgrp;
struct perf_event_context *ctx = event->ctx;
@@ -2406,6 +2427,92 @@ static int perf_event_read_cgroup_node(struct perf_event 
*event, u64 read_size,
 
return n * sizeof(u64);
 }
+
+static int perf_event_read_cgrp_node_sibling(struct perf_event *event,
+u64 read_format, u64 cgrp_id,
+u64 *values)
+{
+   struct perf_cgroup_node *cgrp;
+   int n = 0;
+
+   cgrp = find_cgroup_node(event, cgrp_id);
+   if (cgrp == NULL)
+   return (read_format & PERF_FORMAT_ID) ? 2 : 1;
+
+   values[n++] = cgrp->count;
+   if (read_format & PERF_FORMAT_ID)
+   values[n++] = primary_event_id(event);
+   return n;
+}
+
+static int perf_event_read_cgrp_node_group(struct perf_event *event, u64 
cgrp_id,
+  char __user *buf)
+{
+   struct perf_cgroup_node *cgrp;
+   struct perf_event_context *ctx = event->ctx;
+   struct perf_event *sibling;
+   u64 read_format = event->attr.read_format;
+   unsigned long flags;
+   u64 *values;
+   int n = 1;
+   int ret;
+
+   values = kzalloc(event->read_size, GFP_KERNEL);
+   if (!values)
+   return -ENOMEM;
+
+   values[0] = 1 + event->nr_siblings;
+
+   /* update event count and times (possibly run on other cpu) */
+   (void)perf_event_read(event, true);
+
+   raw_spin_lock_irqsave(>lock, flags);
+
+   cgrp = find_cgroup_node(event, cgrp_id);
+   if (cgrp == NULL) {
+   raw_spin_unlock_irqrestore(>lock, flags);
+   kfree(values);
+   return -ENOENT;
+   }
+
+   if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
+   values[n++] = cgrp->time_enabled;
+   if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
+

aarch64-linux-ld: Unexpected GOT/PLT entries detected!

2021-04-12 Thread kernel test robot

Hi Kees,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   89698becf06d341a700913c3d89ce2a914af69a2
commit: be2881824ae9eb92a35b094f734f9ca7339ddf6d arm64/build: Assert for 
unwanted sections
date:   7 months ago
config: arm64-randconfig-r004-20210413 (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be2881824ae9eb92a35b094f734f9ca7339ddf6d
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout be2881824ae9eb92a35b094f734f9ca7339ddf6d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> aarch64-linux-ld: Unexpected GOT/PLT entries detected!
>> aarch64-linux-ld: Unexpected run-time procedure linkages detected!
   aarch64-linux-ld: drivers/gpu/drm/pl111/pl111_versatile.o: in function 
`pl111_versatile_init':
   pl111_versatile.c:(.text+0x41c): undefined reference to 
`devm_regmap_init_vexpress_config'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

[PATCH v2 0/2] perf core: Sharing events with multiple cgroups

2021-04-12 Thread Namhyung Kim

Hello,

This work is to make perf stat more scalable with a lot of cgroups.

Changes in v2)
 * use cacheline_aligned macro instead of the padding
 * enclose the cgroup node list initialization
 * add more comments
 * add Acked-by from Song Liu


Currently we need to open a separate perf_event to count an event in a
cgroup.  For a big machine, this requires lots of events like

  256 cpu x 8 events x 200 cgroups = 409600 events

This is very wasteful and not scalable.  In this case, the perf stat
actually counts exactly same events for each cgroup.  I think we can
just use a single event to measure all cgroups running on that cpu.

So I added new ioctl commands to add per-cgroup counters to an
existing perf_event and to read the per-cgroup counters from the
event.  The per-cgroup counters are updated during the context switch
if tasks' cgroups are different (and no need to change the HW PMU).
It keeps the counters in a hash table with cgroup id as a key.

With this change, average processing time of my internal test workload
which runs tasks in a different cgroup and communicates by pipes
dropped from 11.3 usec to 5.8 usec.

Thanks,
Namhyung


Namhyung Kim (2):
  perf/core: Share an event with multiple cgroups
  perf/core: Support reading group events with shared cgroups

 include/linux/perf_event.h  |  22 ++
 include/uapi/linux/perf_event.h |   2 +
 kernel/events/core.c| 591 ++--
 3 files changed, 588 insertions(+), 27 deletions(-)

-- 
2.31.1.295.g9ea45b61b8-goog

[PATCH v2 1/2] perf/core: Share an event with multiple cgroups

2021-04-12 Thread Namhyung Kim

As we can run many jobs (in container) on a big machine, we want to
measure each job's performance during the run.  To do that, the
perf_event can be associated to a cgroup to measure it only.

However such cgroup events need to be opened separately and it causes
significant overhead in event multiplexing during the context switch
as well as resource consumption like in file descriptors and memory
footprint.

As a cgroup event is basically a cpu event, we can share a single cpu
event for multiple cgroups.  All we need is a separate counter (and
two timing variables) for each cgroup.  I added a hash table to map
from cgroup id to the attached cgroups.

With this change, the cpu event needs to calculate a delta of event
counter values when the cgroups of current and the next task are
different.  And it attributes the delta to the current task's cgroup.

This patch adds two new ioctl commands to perf_event for light-weight
cgroup event counting (i.e. perf stat).

 * PERF_EVENT_IOC_ATTACH_CGROUP - it takes a buffer consists of a
 64-bit array to attach given cgroups.  The first element is a
 number of cgroups in the buffer, and the rest is a list of cgroup
 ids to add a cgroup info to the given event.

 * PERF_EVENT_IOC_READ_CGROUP - it takes a buffer consists of a 64-bit
 array to get the event counter values.  The first element is size
 of the array in byte, and the second element is a cgroup id to
 read.  The rest is to save the counter value and timings.

This attaches all cgroups in a single syscall and I didn't add the
DETACH command deliberately to make the implementation simple.  The
attached cgroup nodes would be deleted when the file descriptor of the
perf_event is closed.

Cc: Tejun Heo 
Acked-by: Song Liu 
Signed-off-by: Namhyung Kim 
---
 include/linux/perf_event.h  |  22 ++
 include/uapi/linux/perf_event.h |   2 +
 kernel/events/core.c| 477 ++--
 3 files changed, 474 insertions(+), 27 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3f7f89ea5e51..4b03cbadf4a0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -771,6 +771,19 @@ struct perf_event {
 
 #ifdef CONFIG_CGROUP_PERF
struct perf_cgroup  *cgrp; /* cgroup event is attach to */
+
+   /* to share an event for multiple cgroups */
+   struct hlist_head   *cgrp_node_hash;
+   struct perf_cgroup_node *cgrp_node_entries;
+   int nr_cgrp_nodes;
+   int cgrp_node_hash_bits;
+
+   struct list_headcgrp_node_entry;
+
+   /* snapshot of previous reading (for perf_cgroup_node below) */
+   u64 cgrp_node_count;
+   u64 cgrp_node_time_enabled;
+   u64 cgrp_node_time_running;
 #endif
 
 #ifdef CONFIG_SECURITY
@@ -780,6 +793,13 @@ struct perf_event {
 #endif /* CONFIG_PERF_EVENTS */
 };
 
+struct perf_cgroup_node {
+   struct hlist_node   node;
+   u64 id;
+   u64 count;
+   u64 time_enabled;
+   u64 time_running;
+} cacheline_aligned;
 
 struct perf_event_groups {
struct rb_root  tree;
@@ -843,6 +863,8 @@ struct perf_event_context {
int pin_count;
 #ifdef CONFIG_CGROUP_PERF
int nr_cgroups;  /* cgroup evts */
+   struct list_headcgrp_node_list;
+   struct list_headcgrp_ctx_entry;
 #endif
void*task_ctx_data; /* pmu specific data */
struct rcu_head rcu_head;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ad15e40d7f5d..06bc7ab13616 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -479,6 +479,8 @@ struct perf_event_query_bpf {
 #define PERF_EVENT_IOC_PAUSE_OUTPUT_IOW('$', 9, __u32)
 #define PERF_EVENT_IOC_QUERY_BPF   _IOWR('$', 10, struct 
perf_event_query_bpf *)
 #define PERF_EVENT_IOC_MODIFY_ATTRIBUTES   _IOW('$', 11, struct 
perf_event_attr *)
+#define PERF_EVENT_IOC_ATTACH_CGROUP   _IOW('$', 12, __u64 *)
+#define PERF_EVENT_IOC_READ_CGROUP _IOWR('$', 13, __u64 *)
 
 enum perf_event_ioc_flags {
PERF_IOC_FLAG_GROUP = 1U << 0,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f07943183041..0c6b3848a61f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -379,6 +379,7 @@ enum event_type_t {
  * perf_cgroup_events: >0 per-cpu cgroup events exist on this cpu
  */
 
+static void perf_sched_enable(void);
 static void perf_sched_delayed(struct work_struct *work);
 DEFINE_STATIC_KEY_FALSE(perf_sched_events);

Re: [PATCH 4/9] userfaultfd/shmem: support UFFDIO_CONTINUE for shmem

2021-04-12 Thread Axel Rasmussen

On Mon, Apr 12, 2021 at 4:17 PM Peter Xu  wrote:
>
> On Thu, Apr 08, 2021 at 04:43:22PM -0700, Axel Rasmussen wrote:
> > +/*
> > + * Install PTEs, to map dst_addr (within dst_vma) to page.
> > + *
> > + * This function handles MCOPY_ATOMIC_CONTINUE (which is always 
> > file-backed),
> > + * whether or not dst_vma is VM_SHARED. It also handles the more general
> > + * MCOPY_ATOMIC_NORMAL case, when dst_vma is *not* VM_SHARED (it may be 
> > file
> > + * backed, or not).
> > + *
> > + * Note that MCOPY_ATOMIC_NORMAL for a VM_SHARED dst_vma is handled by
> > + * shmem_mcopy_atomic_pte instead.
> > + */
> > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t 
> > *dst_pmd,
> > +  struct vm_area_struct *dst_vma,
> > +  unsigned long dst_addr, struct page 
> > *page,
> > +  bool newly_allocated, bool wp_copy)
> > +{
> > + int ret;
> > + pte_t _dst_pte, *dst_pte;
> > + int writable;
> > + bool vm_shared = dst_vma->vm_flags & VM_SHARED;
> > + spinlock_t *ptl;
> > + struct inode *inode;
> > + pgoff_t offset, max_off;
> > +
> > + _dst_pte = mk_pte(page, dst_vma->vm_page_prot);
> > + writable = dst_vma->vm_flags & VM_WRITE;
> > + /* For private, non-anon we need CoW (don't write to page cache!) */
> > + if (!vma_is_anonymous(dst_vma) && !vm_shared)
> > + writable = 0;
> > +
> > + if (writable || vma_is_anonymous(dst_vma))
> > + _dst_pte = pte_mkdirty(_dst_pte);
> > + if (writable) {
> > + if (wp_copy)
> > + _dst_pte = pte_mkuffd_wp(_dst_pte);
> > + else
> > + _dst_pte = pte_mkwrite(_dst_pte);
> > + } else if (vm_shared) {
> > + /*
> > +  * Since we didn't pte_mkdirty(), mark the page dirty or it
> > +  * could be freed from under us. We could do this
> > +  * unconditionally, but doing it only if !writable is faster.
> > +  */
> > + set_page_dirty(page);
> > + }
> > +
> > + dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, );
> > +
> > + if (vma_is_shmem(dst_vma)) {
> > + /* The shmem MAP_PRIVATE case requires checking the i_size */
>
> When you start to use this function in the last patch it'll be needed too even
> if MAP_SHARED?
>
> How about directly state the reason of doing this ("serialize against truncate
> with the PT lock") instead of commenting about "who will need it"?
>
> > + inode = dst_vma->vm_file->f_inode;
> > + offset = linear_page_index(dst_vma, dst_addr);
> > + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> > + ret = -EFAULT;
> > + if (unlikely(offset >= max_off))
> > + goto out_unlock;
> > + }
>
> [...]
>
> > +/* Handles UFFDIO_CONTINUE for all shmem VMAs (shared or private). */
> > +static int mcontinue_atomic_pte(struct mm_struct *dst_mm,
> > + pmd_t *dst_pmd,
> > + struct vm_area_struct *dst_vma,
> > + unsigned long dst_addr,
> > + bool wp_copy)
> > +{
> > + struct inode *inode = file_inode(dst_vma->vm_file);
> > + pgoff_t pgoff = linear_page_index(dst_vma, dst_addr);
> > + struct page *page;
> > + int ret;
> > +
> > + ret = shmem_getpage(inode, pgoff, , SGP_READ);
>
> SGP_READ looks right, as we don't want page allocation.  However I noticed
> there's very slight difference when the page was just fallocated:
>
> /* fallocated page? */
> if (page && !PageUptodate(page)) {
> if (sgp != SGP_READ)
> goto clear;
> unlock_page(page);
> put_page(page);
> page = NULL;
> hindex = index;
> }
>
> I think it won't happen for your case since the page should be uptodate 
> already
> (the other thread should check and modify the page before CONTINUE), but still
> raise this up, since if the page was allocated it smells better to still
> install the fallocated page (do we need to clear the page and SetUptodate)?

Sorry for the somewhat rambling thought process:

My first thought is, I don't really know what PageUptodate means for
shmem pages. If I understand correctly, normally we say PageUptodate()
if the in memory data is more recent or equivalent to the on-disk
data. But, shmem pages are entirely in memory - they are file backed
in name only, in some sense.

fallocate() does all sorts of things so the comment to me seems a bit
ambiguous, but it seems the implication is that we're worried
specifically about the case where the shmem page was recently
allocated with fallocate(mode=0)? In that case, do we use
!PageUptodate() to denote that the page has been allocated, but its
contents are undefined?

I

[gustavoars-linux:testing/warray-bounds] BUILD SUCCESS WITH WARNING 8f00c4d955f8c343277181b46fac418101c521bf

2021-04-12 Thread kernel test robot

tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git 
testing/warray-bounds
branch HEAD: 8f00c4d955f8c343277181b46fac418101c521bf  ixgbe: Fix out-bounds 
warning in ixgbe_host_interface_command()

possible Warning in current branch:

arch/x86/include/asm/string_32.h:182:25: warning: '__builtin_memcpy' offset 
[18, 23] from the object at 'sig' is out of the bounds of referenced subobject 
'daddr' with type 'u8[6]' {aka 'unsigned char[6]'} at offset 11 [-Warray-bounds]
arch/x86/include/asm/string_32.h:182:25: warning: '__builtin_memcpy' offset 
[25, 95] from the object at 'sig' is out of the bounds of referenced subobject 
'beacon_period' with type 'short unsigned int' at offset 22 [-Warray-bounds]
drivers/ide/ide-ioctls.c:213:2: warning: 'memcpy' offset [3, 7] from the object 
at 'cmd' is out of the bounds of referenced subobject 'feature' with type 
'unsigned char' at offset 1 [-Warray-bounds]
drivers/iommu/intel/svm.c:1215:4: warning: 'memcpy' offset [25, 32] from the 
object at 'desc' is out of the bounds of referenced subobject 'qw2' with type 
'long long unsigned int' at offset 16 [-Warray-bounds]
include/linux/fortify-string.h:20:29: warning: '__builtin_memcpy' offset [21, 
80] from the object at 'init' is out of the bounds of referenced subobject 
'chipset' with type 'int' at offset 16 [-Warray-bounds]

Warning ids grouped by kconfigs:

gcc_recent_errors
|-- i386-randconfig-c021-20210412
|   |-- 
arch-x86-include-asm-string_32.h:warning:__builtin_memcpy-offset-from-the-object-at-sig-is-out-of-the-bounds-of-referenced-subobject-beacon_period-with-type-short-unsigned-int-at-offset
|   `-- 
arch-x86-include-asm-string_32.h:warning:__builtin_memcpy-offset-from-the-object-at-sig-is-out-of-the-bounds-of-referenced-subobject-daddr-with-type-u8-aka-unsigned-char-at-offset
|-- ia64-allmodconfig
|   `-- 
drivers-ide-ide-ioctls.c:warning:memcpy-offset-from-the-object-at-cmd-is-out-of-the-bounds-of-referenced-subobject-feature-with-type-unsigned-char-at-offset
|-- ia64-allyesconfig
|   `-- 
drivers-ide-ide-ioctls.c:warning:memcpy-offset-from-the-object-at-cmd-is-out-of-the-bounds-of-referenced-subobject-feature-with-type-unsigned-char-at-offset
|-- parisc-allyesconfig
|   `-- 
drivers-ide-ide-ioctls.c:warning:memcpy-offset-from-the-object-at-cmd-is-out-of-the-bounds-of-referenced-subobject-feature-with-type-unsigned-char-at-offset
|-- x86_64-randconfig-a016-20210412
|   |-- 
drivers-ide-ide-ioctls.c:warning:memcpy-offset-from-the-object-at-cmd-is-out-of-the-bounds-of-referenced-subobject-feature-with-type-unsigned-char-at-offset
|   `-- 
drivers-iommu-intel-svm.c:warning:memcpy-offset-from-the-object-at-desc-is-out-of-the-bounds-of-referenced-subobject-qw2-with-type-long-long-unsigned-int-at-offset
|-- x86_64-randconfig-c002-20210412
|   `-- 
drivers-iommu-intel-svm.c:warning:memcpy-offset-from-the-object-at-desc-is-out-of-the-bounds-of-referenced-subobject-qw2-with-type-long-long-unsigned-int-at-offset
|-- x86_64-randconfig-m001-20210412
|   `-- 
include-linux-fortify-string.h:warning:__builtin_memcpy-offset-from-the-object-at-init-is-out-of-the-bounds-of-referenced-subobject-chipset-with-type-int-at-offset
`-- x86_64-randconfig-s022-20210412
`-- 
drivers-ide-ide-ioctls.c:warning:memcpy-offset-from-the-object-at-cmd-is-out-of-the-bounds-of-referenced-subobject-feature-with-type-unsigned-char-at-offset

elapsed time: 720m

configs tested: 97
configs skipped: 2

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
powerpc  pcm030_defconfig
arm mv78xx0_defconfig
arm  exynos_defconfig
arm   multi_v4t_defconfig
mips   ip22_defconfig
arm  colibri_pxa300_defconfig
powerpcge_imp3a_defconfig
sh espt_defconfig
armtrizeps4_defconfig
powerpc powernv_defconfig
sh  r7785rp_defconfig
powerpc  arches_defconfig
powerpc  makalu_defconfig
armzeus_defconfig
sh   se7750_defconfig
arm   omap2plus_defconfig
sh  rsk7203_defconfig
sh  urquell_defconfig
arm  ixp4xx_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nds32   defconfig
nios2allyesconfig
csky

Re: [PATCH][next] scsi: mpt3sas: Fix out-of-bounds warnings in _ctl_addnl_diag_query

2021-04-12 Thread Martin K. Petersen



Gustavo,

> Fix the following out-of-bounds warnings by embedding existing struct
> htb_rel_query into struct mpt3_addnl_diag_query, instead of
> duplicating its members:

Applied to 5.13/scsi-staging, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH bpf-next v2] libbpf: clarify flags in ringbuf helpers

2021-04-12 Thread patchwork-bot+netdevbpf

Hello:

This patch was applied to bpf/bpf-next.git (refs/heads/master):

On Mon, 12 Apr 2021 16:24:32 -0300 you wrote:
> In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.
> 
> For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
> notification to the process if needed.
> 
> Signed-off-by: Pedro Tammela 
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2] libbpf: clarify flags in ringbuf helpers
https://git.kernel.org/bpf/bpf-next/c/5c507329000e

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

Re: [PATCH bpf-next v2] libbpf: clarify flags in ringbuf helpers

2021-04-12 Thread Andrii Nakryiko

On Mon, Apr 12, 2021 at 12:25 PM Pedro Tammela  wrote:
>
> In 'bpf_ringbuf_reserve()' we require the flag to '0' at the moment.
>
> For 'bpf_ringbuf_{discard,submit,output}' a flag of '0' might send a
> notification to the process if needed.
>
> Signed-off-by: Pedro Tammela 
> ---

Great, thanks! Applied to bpf-next.

>  include/uapi/linux/bpf.h   | 16 
>  tools/include/uapi/linux/bpf.h | 16 
>  2 files changed, 32 insertions(+)
>

[...]

Urgent Response

2021-04-12 Thread Alexandra Kelly

Dear friend,


I am contacting you independently of my investigation in
my bank and no one is informed of this communication. I need your
urgent assistance in transferring the sum of $5.3 million dollars to
your private account,that belongs to one of our foreign customers who
died a longtime with his supposed NEXT OF KIN since July 22, 2003. The
money has been here in our Bank lying dormant for years now without
anybody coming for the claim of it.

I want to release the money to you as the relative to our deceased
customer , the Banking laws here does not allow such money to stay
more than 18 years, because the money will be recalled to the Bank
treasury account as unclaimed fund. I am ready to share with you 40%
for you and 60% will be kept for me, by indicating your interest i
will send you the full details on how the business will be executed, i
will be waiting for your urgent response.

Re: [PATCH] csky: fix syscache.c fallthrough warning

2021-04-12 Thread Guo Ren

Acked-by: Guo Ren 

It's a fallthrough is for BCACHE, but affects ICACHE with more
expensive. I'll fix up it later.

}

On Mon, Apr 12, 2021 at 12:41 AM Randy Dunlap  wrote:
>
> This case of the switch statement falls through to the following case.
> This appears to be on purpose, so declare it as OK.
>
> ../arch/csky/mm/syscache.c: In function '__do_sys_cacheflush':
> ../arch/csky/mm/syscache.c:17:3: warning: this statement may fall through 
> [-Wimplicit-fallthrough=]
>17 |   flush_icache_mm_range(current->mm,
>   |   ^~
>18 | (unsigned long)addr,
>   | 
>19 | (unsigned long)addr + bytes);
>   | 
> ../arch/csky/mm/syscache.c:20:2: note: here
>20 |  case DCACHE:
>   |  ^~~~
>
> Fixes: 997153b9a75c ("csky: Add flush_icache_mm to defer flush icache all")
> Signed-off-by: Randy Dunlap 
> Cc: Guo Ren 
> Cc: linux-c...@vger.kernel.org
> Cc: Arnd Bergmann 
> ---
> @Guo, should this be a "break" instead of fallthrough?
>
>  arch/csky/mm/syscache.c |1 +
>  1 file changed, 1 insertion(+)
>
> --- linux-next-20210409.orig/arch/csky/mm/syscache.c
> +++ linux-next-20210409/arch/csky/mm/syscache.c
> @@ -17,6 +17,7 @@ SYSCALL_DEFINE3(cacheflush,
> flush_icache_mm_range(current->mm,
> (unsigned long)addr,
> (unsigned long)addr + bytes);
> +   fallthrough;
> case DCACHE:
> dcache_wb_range((unsigned long)addr,
> (unsigned long)addr + bytes);



--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

Re: [RFC v4 net-next 1/4] net: phy: add MediaTek PHY driver

2021-04-12 Thread DENG Qingfang

On Mon, Apr 12, 2021 at 11:08:36PM +0800, DENG Qingfang wrote:
> On Mon, Apr 12, 2021 at 07:04:49AM +, René van Dorst wrote:
> > Hi Qingfang,
> > > +static void mtk_phy_config_init(struct phy_device *phydev)
> > > +{
> > > + /* Disable EEE */
> > > + phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV, 0);
> > 
> > For my EEE patch I changed this line to:
> > 
> > genphy_config_eee_advert(phydev);
> > 
> > So PHY EEE part is setup properly at boot, instead enable it manual via
> > ethtool.
> > This function also takes the DTS parameters "eee-broken-" in to account
> > while
> > setting-up the PHY.
> 
> Thanks, I'm now testing with it.

Hi Rene,

Within 12 hours, I got some spontaneous link down/ups when EEE is enabled:

[16334.236233] mt7530 mdio-bus:1f wan: Link is Down
[16334.241340] br-lan: port 3(wan) entered disabled state
[16337.355988] mt7530 mdio-bus:1f wan: Link is Up - 1Gbps/Full - flow control 
rx/tx
[16337.363468] br-lan: port 3(wan) entered blocking state
[16337.368638] br-lan: port 3(wan) entered forwarding state

The cable is a 30m Cat.6 and never has such issue when EEE is disabled.
Perhaps WAKEUP_TIME_1000/100 or some PHY registers need to be fine-tuned,
but for now I think it should be disabled by default.

> 
> > 
> > > +
> > > + /* Enable HW auto downshift */
> > > + phy_modify_paged(phydev, MTK_PHY_PAGE_EXTENDED, 0x14, 0, BIT(4));

[PATCH v2] scsi: qlogicpti: remove unneeded semicolon

2021-04-12 Thread Yang Li

Eliminate the following coccicheck warning:
./drivers/scsi/qlogicpti.c:1153:3-4: Unneeded semicolon

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---

Change in v2:
--One patch per driver

 drivers/scsi/qlogicpti.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qlogicpti.c b/drivers/scsi/qlogicpti.c
index d84e218..3da58263 100644
--- a/drivers/scsi/qlogicpti.c
+++ b/drivers/scsi/qlogicpti.c
@@ -1150,7 +1150,7 @@ static struct scsi_cmnd *qlogicpti_intr_handler(struct 
qlogicpti *qpti)
case COMMAND_ERROR:
case COMMAND_PARAM_ERROR:
break;
-   };
+   }
sbus_writew(0, qpti->qregs + SBUS_SEMAPHORE);
}
 
-- 
1.8.3.1

Re: [PATCH v3 02/10] riscv: add __init section marker to some functions

2021-04-12 Thread Anup Patel

On Mon, Apr 12, 2021 at 9:47 PM Jisheng Zhang  wrote:
>
> From: Jisheng Zhang 
>
> They are not needed after booting, so mark them as __init to move them
> to the __init section.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/kernel/cpufeature.c | 2 +-
>  arch/riscv/kernel/traps.c  | 2 +-
>  arch/riscv/mm/init.c   | 4 ++--
>  arch/riscv/mm/kasan_init.c | 6 +++---
>  arch/riscv/mm/ptdump.c | 2 +-
>  5 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index ac202f44a670..e4741e1f0add 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -59,7 +59,7 @@ bool __riscv_isa_extension_available(const unsigned long 
> *isa_bitmap, int bit)
>  }
>  EXPORT_SYMBOL_GPL(__riscv_isa_extension_available);
>
> -void riscv_fill_hwcap(void)
> +void __init riscv_fill_hwcap(void)
>  {
> struct device_node *node;
> const char *isa;
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 0879b5df11b9..041f4b44262e 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -196,6 +196,6 @@ int is_valid_bugaddr(unsigned long pc)
>  #endif /* CONFIG_GENERIC_BUG */
>
>  /* stvec & scratch is already set from head.S */
> -void trap_init(void)
> +void __init trap_init(void)
>  {
>  }
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index dbeaa4144e4d..ecd485662b07 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -70,7 +70,7 @@ static inline void print_mlm(char *name, unsigned long b, 
> unsigned long t)
>   (((t) - (b)) >> 20));
>  }
>
> -static void print_vm_layout(void)
> +static void __init print_vm_layout(void)
>  {
> pr_notice("Virtual kernel memory layout:\n");
> print_mlk("fixmap", (unsigned long)FIXADDR_START,
> @@ -553,7 +553,7 @@ static inline void setup_vm_final(void)
>  #endif /* CONFIG_MMU */
>
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> -void protect_kernel_text_data(void)
> +void __init protect_kernel_text_data(void)
>  {
> unsigned long text_start = (unsigned long)_start;
> unsigned long init_text_start = (unsigned long)__init_text_begin;
> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> index ec0029097251..e459290d2629 100644
> --- a/arch/riscv/mm/kasan_init.c
> +++ b/arch/riscv/mm/kasan_init.c
> @@ -48,7 +48,7 @@ asmlinkage void __init kasan_early_init(void)
> local_flush_tlb_all();
>  }
>
> -static void kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned 
> long end)
> +static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, 
> unsigned long end)
>  {
> phys_addr_t phys_addr;
> pte_t *ptep, *base_pte;
> @@ -70,7 +70,7 @@ static void kasan_populate_pte(pmd_t *pmd, unsigned long 
> vaddr, unsigned long en
> set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
>  }
>
> -static void kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned 
> long end)
> +static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, 
> unsigned long end)
>  {
> phys_addr_t phys_addr;
> pmd_t *pmdp, *base_pmd;
> @@ -105,7 +105,7 @@ static void kasan_populate_pmd(pgd_t *pgd, unsigned long 
> vaddr, unsigned long en
> set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
>  }
>
> -static void kasan_populate_pgd(unsigned long vaddr, unsigned long end)
> +static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
>  {
> phys_addr_t phys_addr;
> pgd_t *pgdp = pgd_offset_k(vaddr);
> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
> index ace74dec7492..3b7b6e4d025e 100644
> --- a/arch/riscv/mm/ptdump.c
> +++ b/arch/riscv/mm/ptdump.c
> @@ -331,7 +331,7 @@ static int ptdump_show(struct seq_file *m, void *v)
>
>  DEFINE_SHOW_ATTRIBUTE(ptdump);
>
> -static int ptdump_init(void)
> +static int __init ptdump_init(void)
>  {
> unsigned int i, j;
>
> --
> 2.31.0
>
>

Re: [PATCH v3 00/10] riscv: improve self-protection

2021-04-12 Thread Anup Patel

On Mon, Apr 12, 2021 at 9:46 PM Jisheng Zhang  wrote:
>
> From: Jisheng Zhang 
>
> patch1 removes the non-necessary setup_zero_page()
> patch2 is a trivial improvement patch to move some functions to .init
> section
>
> Then following patches improve self-protection by:
>
> Marking some variables __ro_after_init
> Constifing some variables
> Enabling ARCH_HAS_STRICT_MODULE_RWX
>
> Hi Anup,
>
> I kept the __init modification to trap_init(), I will cook a trivial
> series to provide a __weak but NULL trap_init() implementation in
> init/main.c then remove all NULL implementation from all arch.

Yes, it makes sense to do this as a separate series.

Regards,
Anup

>
> Thanks
>
> Since v2:
>   - collect Reviewed-by tag
>   - add one patch to remove unnecessary setup_zero_page()
>
> Since v1:
>   - no need to move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core
> because RV32 uses the default module_alloc() for jit code which also
> meets W^X after patch8
>   - fix a build error caused by local debug code clean up
>
>
> Jisheng Zhang (10):
>   riscv: mm: Remove setup_zero_page()
>   riscv: add __init section marker to some functions
>   riscv: Mark some global variables __ro_after_init
>   riscv: Constify sys_call_table
>   riscv: Constify sbi_ipi_ops
>   riscv: kprobes: Implement alloc_insn_page()
>   riscv: bpf: Write protect JIT code
>   riscv: bpf: Avoid breaking W^X on RV64
>   riscv: module: Create module allocations without exec permissions
>   riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU
>
>  arch/riscv/Kconfig |  1 +
>  arch/riscv/include/asm/smp.h   |  4 ++--
>  arch/riscv/include/asm/syscall.h   |  2 +-
>  arch/riscv/kernel/cpufeature.c |  2 +-
>  arch/riscv/kernel/module.c | 10 --
>  arch/riscv/kernel/probes/kprobes.c |  8 
>  arch/riscv/kernel/sbi.c| 10 +-
>  arch/riscv/kernel/smp.c|  6 +++---
>  arch/riscv/kernel/syscall_table.c  |  2 +-
>  arch/riscv/kernel/time.c   |  2 +-
>  arch/riscv/kernel/traps.c  |  2 +-
>  arch/riscv/kernel/vdso.c   |  4 ++--
>  arch/riscv/mm/init.c   | 16 +---
>  arch/riscv/mm/kasan_init.c |  6 +++---
>  arch/riscv/mm/ptdump.c |  2 +-
>  arch/riscv/net/bpf_jit_comp64.c|  2 +-
>  arch/riscv/net/bpf_jit_core.c  |  1 +
>  17 files changed, 45 insertions(+), 35 deletions(-)
>
> --
> 2.31.0
>
>

[PATCH v2] scsi: pmcraid: remove unneeded semicolon

2021-04-12 Thread Yang Li

Eliminate the following coccicheck warning:
./drivers/scsi/pmcraid.c:5090:2-3: Unneeded semicolon

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---

Change in v2:
--One patch per driver

 drivers/scsi/pmcraid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c
index 834556e..44e9709 100644
--- a/drivers/scsi/pmcraid.c
+++ b/drivers/scsi/pmcraid.c
@@ -5087,7 +5087,7 @@ static int pmcraid_init_instance(struct pci_dev *pdev, 
struct Scsi_Host *host,
mapped_pci_addr + chip_cfg->ioa_host_mask_clr;
pint_regs->global_interrupt_mask_reg =
mapped_pci_addr + chip_cfg->global_intr_mask;
-   };
+   }
 
pinstance->ioa_reset_attempts = 0;
init_waitqueue_head(>reset_wait_q);
-- 
1.8.3.1

Re: [PATCH v4 5/7] cpufreq: qcom-hw: Implement CPRh aware OSM programming

2021-04-12 Thread Viresh Kumar

On 12-04-21, 15:01, Taniya Das wrote:
> Technically the HW we are trying to program here differs in terms of
> clocking, the LUT definitions and many more. It will definitely make
> debugging much more troublesome if we try to accomodate multiple versions of
> CPUFREQ-HW in the same code.
> 
> Thus to keep it simple, easy to read, debug, the suggestion is to keep it
> with "v1" tag as the OSM version we are trying to put here is from OSM1.0.

That is a valid point and is always a case with so many drivers. What
I am concerned about is how much code is common across versions, if it
is 5-70%, or more, then we should definitely share, arrange to have
callbacks or ops per version and call them in a generic fashion instead
of writing a new driver. This is what's done across
drivers/frameworks, etc.

-- 
viresh

[PATCH] fpga: xilinx-pr-decoupler: remove useless function

2021-04-12 Thread Jiapeng Chong

Fix the following gcc warning:

drivers/fpga/xilinx-pr-decoupler.c:32:19: warning: unused function
'xlnx_pr_decouple_read' [-Wunused-function].

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 drivers/fpga/xilinx-pr-decoupler.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/fpga/xilinx-pr-decoupler.c 
b/drivers/fpga/xilinx-pr-decoupler.c
index 7d69af2..f407cb2 100644
--- a/drivers/fpga/xilinx-pr-decoupler.c
+++ b/drivers/fpga/xilinx-pr-decoupler.c
@@ -29,12 +29,6 @@ static inline void xlnx_pr_decoupler_write(struct 
xlnx_pr_decoupler_data *d,
writel(val, d->io_base + offset);
 }
 
-static inline u32 xlnx_pr_decouple_read(const struct xlnx_pr_decoupler_data *d,
-   u32 offset)
-{
-   return readl(d->io_base + offset);
-}
-
 static int xlnx_pr_decoupler_enable_set(struct fpga_bridge *bridge, bool 
enable)
 {
int err;
-- 
1.8.3.1

Re: [PATCH v3 01/10] riscv: mm: Remove setup_zero_page()

2021-04-12 Thread Anup Patel

On Mon, Apr 12, 2021 at 9:47 PM Jisheng Zhang  wrote:
>
> From: Jisheng Zhang 
>
> The empty_zero_page sits at .bss..page_aligned section, so will be
> cleared to zero during clearing bss, we don't need to clear it again.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/mm/init.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 7f5036fbee8c..dbeaa4144e4d 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -57,11 +57,6 @@ static void __init zone_sizes_init(void)
> free_area_init(max_zone_pfns);
>  }
>
> -static void setup_zero_page(void)
> -{
> -   memset((void *)empty_zero_page, 0, PAGE_SIZE);
> -}
> -
>  #if defined(CONFIG_MMU) && defined(CONFIG_DEBUG_VM)
>  static inline void print_mlk(char *name, unsigned long b, unsigned long t)
>  {
> @@ -589,7 +584,6 @@ void mark_rodata_ro(void)
>  void __init paging_init(void)
>  {
> setup_vm_final();
> -   setup_zero_page();
>  }
>
>  void __init misc_mem_init(void)
> --
> 2.31.0
>
>

Re: [PATCH 2/2] drm/msm/dp: do not re initialize of audio_comp

2021-04-12 Thread Stephen Boyd

Quoting Kuogee Hsieh (2021-04-12 10:03:23)
> At dp_display_disable(), do not re initialize audio_comp if
> hdp_state == ST_DISCONNECT_PENDING (unplug event) to avoid
> race condition which cause 5 second timeout expired.

More details please.

> Also
> add abort mechanism to reduce time spinning at dp_aux_transfer()
> during dpcd read if type-c connection had been broken.

Please split this to a different patch.

> 
> Signed-off-by: Kuogee Hsieh 
> ---
>  drivers/gpu/drm/msm/dp/dp_aux.c | 18 ++
>  drivers/gpu/drm/msm/dp/dp_aux.h |  1 +
>  drivers/gpu/drm/msm/dp/dp_display.c | 16 
>  drivers/gpu/drm/msm/dp/dp_link.c| 20 +++-
>  4 files changed, 46 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/dp/dp_aux.c b/drivers/gpu/drm/msm/dp/dp_aux.c
> index 7c22bfe..e5ece8c 100644
> --- a/drivers/gpu/drm/msm/dp/dp_aux.c
> +++ b/drivers/gpu/drm/msm/dp/dp_aux.c
> @@ -28,6 +28,7 @@ struct dp_aux_private {
> u32 offset;
> u32 segment;
> u32 isr;
> +   atomic_t aborted;

Why is it an atomic?

>  
> struct drm_dp_aux dp_aux;
>  };
> @@ -343,6 +344,11 @@ static ssize_t dp_aux_transfer(struct drm_dp_aux *dp_aux,
>  
> mutex_lock(>mutex);
>  
> +   if (atomic_read(>aborted)) {
> +   ret = -ETIMEDOUT;
> +   goto unlock_exit;
> +   }
> +

Cool, it's checked inside a mutex.

> aux->native = msg->request & (DP_AUX_NATIVE_WRITE & 
> DP_AUX_NATIVE_READ);
>  
> /* Ignore address only message */
> @@ -533,3 +539,15 @@ void dp_aux_put(struct drm_dp_aux *dp_aux)
>  
> devm_kfree(aux->dev, aux);
>  }
> +
> +void dp_aux_abort(struct drm_dp_aux *dp_aux, bool abort)
> +{
> +   struct dp_aux_private *aux;
> +
> +   if (!dp_aux)
> +   return;
> +
> +   aux = container_of(dp_aux, struct dp_aux_private, dp_aux);
> +
> +   atomic_set(>aborted, abort);
> +}
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
> b/drivers/gpu/drm/msm/dp/dp_display.c
> index 4992a049..8960333 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -898,8 +898,10 @@ static int dp_display_disable(struct dp_display_private 
> *dp, u32 data)
> /* wait only if audio was enabled */
> if (dp_display->audio_enabled) {
> /* signal the disconnect event */
> -   reinit_completion(>audio_comp);
> -   dp_display_handle_plugged_change(dp_display, false);
> +   if (dp->hpd_state != ST_DISCONNECT_PENDING) {
> +   reinit_completion(>audio_comp);
> +   dp_display_handle_plugged_change(dp_display, false);
> +   }
> if (!wait_for_completion_timeout(>audio_comp,
> HZ * 5))
> DRM_ERROR("audio comp timeout\n");

This hunk is the first part of the patch and should be split away to one
for itself, with appropriate Fixes tag and a proper explanation.

> @@ -1137,20 +1139,26 @@ static irqreturn_t dp_display_irq_handler(int irq, 
> void *dev_id)
> /* hpd related interrupts */
> if (hpd_isr_status & DP_DP_HPD_PLUG_INT_MASK ||
> hpd_isr_status & DP_DP_HPD_REPLUG_INT_MASK) {
> +   dp_aux_abort(dp->aux, false);
> dp_add_event(dp, EV_HPD_PLUG_INT, 0, 0);
> }
>  
> if (hpd_isr_status & DP_DP_IRQ_HPD_INT_MASK) {
> /* stop sentinel connect pending checking */
> +   dp_aux_abort(dp->aux, false);
> dp_del_event(dp, EV_CONNECT_PENDING_TIMEOUT);
> dp_add_event(dp, EV_IRQ_HPD_INT, 0, 0);
> }
>  
> -   if (hpd_isr_status & DP_DP_HPD_REPLUG_INT_MASK)
> +   if (hpd_isr_status & DP_DP_HPD_REPLUG_INT_MASK) {
> +   dp_aux_abort(dp->aux, false);
> dp_add_event(dp, EV_HPD_REPLUG_INT, 0, 0);
> +   }
>  
> -   if (hpd_isr_status & DP_DP_HPD_UNPLUG_INT_MASK)
> +   if (hpd_isr_status & DP_DP_HPD_UNPLUG_INT_MASK) {
> +   dp_aux_abort(dp->aux, true);

Ok, so it seems that we want to stop trying aux transfers if the unplug
irq comes in? That's a pretty big sledge hammer to stop a transfer in
the middle of progress. Why doesn't the hardware timeout and stop or the
dpcd reads in this DP driver fail and start bailing out when the cable
is disconnected? Having to inject that synthetically is not great. Is
there some sort of AUX channel "status" bit that can be read from the
aux registers in the DP hardware to see if the connection was lost?

> dp_add_event(dp, EV_HPD_UNPLUG_INT, 0, 0);
> +   }
> }
>  
> /* DP controller isr */
> diff --git a/drivers/gpu/drm/msm/dp/dp_link.c 
>

Re: [PATCH 5.11 000/210] 5.11.14-rc1 review

2021-04-12 Thread Guenter Roeck

On Mon, Apr 12, 2021 at 10:38:25AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.11.14 release.
> There are 210 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 14 Apr 2021 08:39:44 +.
> Anything received after that time might be too late.
> 

Build results:
total: 155 pass: 155 fail: 0
Qemu test results:
total: 460 pass: 459 fail: 1
Failed tests:
sh:rts7751r2dplus_defconfig:ata:net,virtio-net:rootfs

udhcpc fails to get an IP address over virtio-net. I reported the same
problem against mainline. This is a spurious problem; the test succeeds
in roughly every other test. It is unknown at this time if the problem
is the patch introducing the problem (commit 0f6925b3e8da ("virtio_net:
Do not pull payload in skb->head")), the sh4 kernel code, qemu, or the
sh4 compiler (though I tried several compiler versions).

I see that this patch is now in pretty much all kernels, so I may report
this on and off until the underlying problem has been found and fixed.
Until then, I guess we'll have to live with it.

Tested-by: Guenter Roeck 

Guenter

Re: Candidate Linux ABI for Intel AMX and hypothetical new related features

2021-04-12 Thread Willy Tarreau

On Mon, Apr 12, 2021 at 07:46:06PM -0400, Len Brown wrote:
> On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski  wrote:
> 
> > AMX: Multiplying a 4x4 matrix probably looks *great* in a
> > microbenchmark.  Do it once and you permanently allocate 8kB (is that
> > even a constant?  can it grow in newer parts?), potentially hurts all
> > future context switches, and does who-knows-what to Turbo licenses and
> > such.
> 
> Intel expects that AMX will be extremely valuable to key workloads.
> It is true that you may never run that kind of workload on the machine
> in front of you,
> and so you have every right to be doubtful about the value of AMX.
> 
> The AMX architectural state size is not expected to change.
> Rather, if a "new AMX" has a different state size, it is expected to
> use a new feature bit, different from AMX.
> 
> The AMX context switch buffer is allocated only if and when a task
> touches AMX registers.
> 
> Yes, there will be data transfer to and from that buffer when three
> things all happen.
> 1. the data is valid
> 2. hardware interrupts the application
> 3. the kernel decides to context switch.

As a userspace developer of a proxy, my code is extremely sensitive to
syscall cost and works in environments where 1 million interrupts/s is
not uncommon. Additionally the data I process are small HTTP headers
and I already had to reimplement my own byte-level memcmp because the
overhead of some libc to decide what variant to use to compare 5 bytes
was higher than the time to iterate over them.

So I'm among those userspace developers who grumble each time new
technology is automatically adopted by the compiler and libs, because
that tends to make me figure what the impact is and how to work around
it. I have no idea what AMX could bring me but reading this above makes
me think that it has a great potential of significantly hurting the
performance if one lib decides to occasionally make use of it. It would
possibly be similar if a lib decided to use AVX-512 to copy data and if
it resulted in the CPU quickly reaching its TDP and starting to throttle
like crazy :-/

Thus I think that the first thing to think about before introducing
possibly cost-sensitive optimizations is : how do I allow easily
user-space to easily disable them for a task, and how do I allow an
admin to easily disable them system-wide. "echo !foobar > cpuinfo"
could be a nice way to mask a flag system-wide for example. prctl()
would be nice for a task (as long as it's not too late already).

Maybe the API should be surrounded by __amx_begin() / __amx_end() and
the calls having undefined behavior outside of these. These flags would
put a flag somewhere asking to extend the stacks, or __amx_begin() could
even point itself to the specific stack to be used. This way it could
possibly allow some userspace libraries to use it for small stuff
without definitely impacting the rest of the process.

> At the risk of stating the obvious...
> Intel's view is that libraries that deliver the most value from the
> hardware are a "good thing",
> and that anything preventing libraries from getting the most value
> from the hardware is a "bad thing":-)

As a developer I have a different view. Anything that requires to build
using different libraries depending on the systems is a real hassle,
and I want to focus on the same code to run everywhere. I'm fine with
some #ifdef in the code if I know that a specific part must run as
fast as possible, and even some runtime detection at various points
but do not want to have to deal with extra dependencies that further
increase the test matrix and combinations in bug reports.

Just my two cents,
Willy

Re: [PATCH v3] f2fs: fix to keep isolation of atomic write

2021-04-12 Thread Chao Yu


On 2021/4/13 11:27, Jaegeuk Kim wrote:

On 04/12, Chao Yu wrote:

As Yi Chen reported, there is a potential race case described as below:

Thread AThread B
- f2fs_ioc_start_atomic_write
- mkwrite
 - set_page_dirty
  - f2fs_set_page_private(page, 0)
  - set_inode_flag(FI_ATOMIC_FILE)
- mkwrite same page
 - set_page_dirty
  - f2fs_register_inmem_page
   - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE)
 failed due to PagePrivate flag has been set
   - list_add_tail
- truncate_inode_pages
 - f2fs_invalidate_page
  - clear page private but w/o remove it from
inmem_list
 - set page->mapping to NULL
- f2fs_ioc_commit_atomic_write
  - __f2fs_commit_inmem_pages
- __revoke_inmem_pages
 - f2fs_put_page panic as page->mapping is NULL

The root cause is we missed to keep isolation of atomic write in the case
of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem
lock to avoid this issue.


My only concern is performance regression. Could you please verify the numbers?


Do you have specific test script?

IIRC, the scenario you mean is multi-threads write/mmap the same db, right?

Thanks,





Reported-by: Yi Chen 
Signed-off-by: Chao Yu 
---
v3:
- rebase to last dev branch
- update commit message because this patch fixes a different racing issue
of atomic write
  fs/f2fs/file.c| 3 +++
  fs/f2fs/segment.c | 6 ++
  2 files changed, 9 insertions(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index d697c8900fa7..6284b2f4a60b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out;
  
  	down_write(_I(inode)->i_gc_rwsem[WRITE]);

+   down_write(_I(inode)->i_mmap_sem);
  
  	/*

 * Should wait end_io to count F2FS_WB_CP_DATA correctly by
@@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
  inode->i_ino, get_dirty_pages(inode));
ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
if (ret) {
+   up_write(_I(inode)->i_mmap_sem);
up_write(_I(inode)->i_gc_rwsem[WRITE]);
goto out;
}
@@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
/* add inode in inmem_list first and set atomic_file */
set_inode_flag(inode, FI_ATOMIC_FILE);
clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+   up_write(_I(inode)->i_mmap_sem);
up_write(_I(inode)->i_gc_rwsem[WRITE]);
  
  	f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 0cb1ca88d4aa..78c8342f52fd 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
struct f2fs_inode_info *fi = F2FS_I(inode);
  
  	do {

+   down_write(_I(inode)->i_mmap_sem);
mutex_lock(>inmem_lock);
if (list_empty(>inmem_pages)) {
fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
@@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode)
spin_unlock(>inode_lock[ATOMIC_FILE]);
  
  			mutex_unlock(>inmem_lock);

+   up_write(_I(inode)->i_mmap_sem);
break;
}
__revoke_inmem_pages(inode, >inmem_pages,
true, false, true);
mutex_unlock(>inmem_lock);
+   up_write(_I(inode)->i_mmap_sem);
} while (1);
  }
  
@@ -468,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)

f2fs_balance_fs(sbi, true);
  
  	down_write(>i_gc_rwsem[WRITE]);

+   down_write(_I(inode)->i_mmap_sem);
  
  	f2fs_lock_op(sbi);

set_inode_flag(inode, FI_ATOMIC_COMMIT);
@@ -479,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
clear_inode_flag(inode, FI_ATOMIC_COMMIT);
  
  	f2fs_unlock_op(sbi);

+
+   up_write(_I(inode)->i_mmap_sem);
up_write(>i_gc_rwsem[WRITE]);
  
  	return err;

--
2.29.2

.

[PATCH v2] scsi: ipr: remove unneeded semicolon

2021-04-12 Thread Yang Li

Eliminate the following coccicheck warning:
./drivers/scsi/ipr.h:1979:2-3: Unneeded semicolon

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---

Change in v2:
--One patch per driver

 drivers/scsi/ipr.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ipr.h b/drivers/scsi/ipr.h
index 783ee03..6c29113 100644
--- a/drivers/scsi/ipr.h
+++ b/drivers/scsi/ipr.h
@@ -1976,7 +1976,7 @@ static inline int ipr_sdt_is_fmt2(u32 sdt_word)
case IPR_SDT_FMT2_BAR5_SEL:
case IPR_SDT_FMT2_EXP_ROM_SEL:
return 1;
-   };
+   }
 
return 0;
 }
-- 
1.8.3.1

Re: [PATCH v1 1/2] dt-bindings: mmc: sdhci-of-aspeed: Add power-gpio and power-switch-gpio

2021-04-12 Thread Steven Lee

The 04/13/2021 10:43, Milton Miller II wrote:
> 
> 
> -"openbmc"  wrote: 
> -
> 
> >To: Rob Herring 
> >From: Steven Lee 
> >Sent by: "openbmc" 
> >Date: 04/12/2021 08:31PM
> >Cc: "open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS"
> >, Ulf Hansson ,
> >Ryan Chen , "moderated list:ASPEED SD/MMC
> >DRIVER" , Andrew Jeffery
> >, "open list:ASPEED SD/MMC DRIVER"
> >, "moderated list:ASPEED SD/MMC DRIVER"
> >, Ryan Chen ,
> >Adrian Hunter , open list
> >, Chin-Ting Kuo
> >, "moderated list:ARM/ASPEED MACHINE
> >SUPPORT" 
> >Subject: [EXTERNAL] Re: [PATCH v1 1/2] dt-bindings: mmc:
> >sdhci-of-aspeed: Add power-gpio and power-switch-gpio
> >
> >The 04/10/2021 02:41, Rob Herring wrote:
> >> On Thu, Apr 08, 2021 at 09:52:17AM +0800, Steven Lee wrote:
> >> > AST2600-A2 EVB provides the reference design for enabling SD bus
> >power
> >> > and toggling SD bus signal voltage by GPIO pins.
> >> > Add the definition and example for power-gpio and
> >power-switch-gpio
> >> > properties.
> >> > 
> >> > In the reference design, GPIOV0 of AST2600-A2 EVB is connected to
> >power
> >> > load switch that providing 3.3v to SD1 bus vdd. GPIOV1 is
> >connected to
> >> > a 1.8v and a 3.3v power load switch that providing signal voltage
> >to
> >> > SD1 bus.
> >> > If GPIOV0 is active high, SD1 bus is enabled. Otherwise, SD1 bus
> >is
> >> > disabled.
> >> > If GPIOV1 is active high, 3.3v power load switch is enabled, SD1
> >signal
> >> > voltage is 3.3v. Otherwise, 1.8v power load switch will be
> >enabled, SD1
> >> > signal voltage becomes 1.8v.
> >> > 
> >> > AST2600-A2 EVB also support toggling signal voltage for SD2 bus.
> >> > The design is the same as SD1 bus. It uses GPIOV2 as power-gpio
> >and GPIOV3
> >> > as power-switch-gpio.
> >> > 
> >> > Signed-off-by: Steven Lee 
> >> > ---
> >> >  .../devicetree/bindings/mmc/aspeed,sdhci.yaml | 25
> >+++
> >> >  1 file changed, 25 insertions(+)
> >> > 
> >> > diff --git
> >a/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> >b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> >> > index 987b287f3bff..515a74614f3c 100644
> >> > --- a/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> >> > +++ b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> >> > @@ -37,6 +37,14 @@ properties:
> >> >clocks:
> >> >  maxItems: 1
> >> >  description: The SD/SDIO controller clock gate
> >> > +  power-gpio:
> >> 
> >> '-gpios' is the preferred form even if just 1.
> >> 
> >
> >Thanks for reviewing, I will change the name.
> 
> is this a clock gate or a power on gpio?
> 
> 

A power on gpio.

> >
> >> > +description:
> >> > +  The GPIO for enabling/disabling SD bus power.
> >> > +maxItems: 1
> >> 
> >> blank line
> >> 
> >
> >I will remove the blank line.
> >
> >> > +  power-switch-gpio:
> >> > +description:
> >> > +  The GPIO for toggling the signal voltage between 3.3v and
> >1.8v.
> 
> Which way does it toggle for which voltage?
> 
> Oh, you said in the change log but not in the binding.
>

I will add description in the binding.

> But please, use gpio controled regulators as Ulf suggested and is
> already used by other mmc controllers upstream.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/
> Documentation/devicetree/bindings/regulator/gpio-regulator.yaml
> 

Thanks for reviewing and the information, I will use gpio-regulator
instead of power-gpio and power-switch-gpio.

> Ulf> Please do not model these as GPIO pins like this. Instead, it's better
> Ulf> to model them as gpio regulators, since the mmc core manages them as
> Ulf> regulators.
> Ulf> 
> Ulf> We have a vmmc regulator (corresponding to vdd) and a vqmmc regulator
> Ulf> (corresponding the signal-voltage level). These are also described in
> Ulf> the common mmc DT bindings, see
> Ulf> Documentation/devicetree/bindings/mmc/mmc-controller.yaml
> Ulf> .
> 
> milton
> 
> >> > +maxItems: 1
> >> >  
> >> >  patternProperties:
> >> >"^sdhci@[0-9a-f]+$":
> >> > @@ -61,6 +69,14 @@ patternProperties:
> >> >sdhci,auto-cmd12:
> >> >  type: boolean
> >> >  description: Specifies that controller should use auto
> >CMD12
> >> > +  power-gpio:
> >> > +description:
> >> > +  The GPIO for enabling/disabling SD bus power.
> >> > +maxItems: 1
> >> > +  power-switch-gpio:
> >> > +description:
> >> > +  The GPIO for toggling the signal voltage between 3.3v
> >and 1.8v.
> >> > +maxItems: 1
> >> >  required:
> >> >- compatible
> >> >- reg
> >> > @@ -80,6 +96,7 @@ required:
> >> >  examples:
> >> >- |
> >> >  #include 
> >> > +#include 
> >> >  sdc@1e74 {
> >> >  compatible = "aspeed,ast2500-sd-controller";
> >> >  reg = <0x1e74 0x100>;
> >> > @@ -94,6 +111,10 @@ examples:
> >> >  interrupts = <26>;
> >> >  sdhci,auto-cmd12;
> >> >

[RFC PATCH 3/3] vfio/hisilicom: add debugfs for driver

2021-04-12 Thread Longfang Liu

Add debugfs debugging interface to live migration driver

Signed-off-by: Longfang Liu 
---
 drivers/vfio/pci/hisilicon/acc_vf_migration.c | 193 ++
 drivers/vfio/pci/hisilicon/acc_vf_migration.h |   2 +
 2 files changed, 195 insertions(+)

diff --git a/drivers/vfio/pci/hisilicon/acc_vf_migration.c 
b/drivers/vfio/pci/hisilicon/acc_vf_migration.c
index 5d8650d..d4eacaf 100644
--- a/drivers/vfio/pci/hisilicon/acc_vf_migration.c
+++ b/drivers/vfio/pci/hisilicon/acc_vf_migration.c
@@ -16,6 +16,9 @@
 
 #define VDM_OFFSET(x) offsetof(struct vfio_device_migration_info, x)
 void vfio_pci_hisilicon_acc_uninit(struct acc_vf_migration *acc_vf_dev);
+static void vf_debugfs_exit(struct acc_vf_migration *acc_vf_dev);
+static struct dentry *mig_debugfs_root;
+static int mig_root_ref;
 
 /* return 0 mailbox ready, -ETIMEDOUT hardware timeout */
 static int qm_wait_mb_ready(struct hisi_qm *qm)
@@ -933,6 +936,193 @@ static const struct vfio_pci_regops vfio_pci_acc_regops = 
{
.add_capability = acc_vf_migration_add_capability,
 };
 
+static ssize_t acc_vf_debug_read(struct file *filp, char __user *buffer,
+  size_t count, loff_t *pos)
+{
+   char buf[VFIO_DEV_DBG_LEN];
+   int len;
+
+   len = scnprintf(buf, VFIO_DEV_DBG_LEN, "%s\n",
+   "echo 0: test vf data store\n"
+   "echo 1: test vf data writeback\n"
+   "echo 2: test vf send mailbox\n"
+   "echo 3: dump vf dev data\n"
+   "echo 4: dump migration state\n");
+
+   return simple_read_from_buffer(buffer, count, pos, buf, len);
+}
+
+static ssize_t acc_vf_debug_write(struct file *filp, const char __user *buffer,
+   size_t count, loff_t *pos)
+{
+   struct acc_vf_migration *acc_vf_dev = filp->private_data;
+   struct device *dev = _vf_dev->vf_dev->dev;
+   struct hisi_qm *qm = acc_vf_dev->vf_qm;
+   char tbuf[VFIO_DEV_DBG_LEN];
+   unsigned long val;
+   u64 data;
+   int len, ret;
+
+   if (*pos)
+   return 0;
+
+   if (count >= VFIO_DEV_DBG_LEN)
+   return -ENOSPC;
+
+   len = simple_write_to_buffer(tbuf, VFIO_DEV_DBG_LEN - 1,
+   pos, buffer, count);
+   if (len < 0)
+   return len;
+   tbuf[len] = '\0';
+   if (kstrtoul(tbuf, 0, ))
+   return -EFAULT;
+
+   switch (val) {
+   case STATE_SAVE:
+   ret = vf_qm_state_save(qm, acc_vf_dev);
+   if (ret)
+   return -EINVAL;
+   break;
+   case STATE_RESUME:
+   ret = vf_qm_state_resume(qm, acc_vf_dev);
+   if (ret)
+   return -EINVAL;
+   break;
+   case MB_TEST:
+   data = readl(qm->io_base + QM_MB_CMD_SEND_BASE);
+   dev_info(dev, "debug mailbox addr: 0x%lx, mailbox val: 
0x%llx\n",
+(uintptr_t)qm->io_base, data);
+   break;
+   case MIG_DATA_DUMP:
+   dev_info(dev, "dumped vf migration data:\n");
+   print_hex_dump(KERN_INFO, "Mig Data:", DUMP_PREFIX_OFFSET,
+   VFIO_DBG_LOG_LEN, 1,
+   (unsigned char *)acc_vf_dev->vf_data,
+   sizeof(struct acc_vf_data), false);
+   break;
+   case MIG_DEV_SHOW:
+   if (!acc_vf_dev->mig_ctl)
+   dev_info(dev, "migration region have release!\n");
+   else
+   dev_info(dev,
+"device  state: %u\n"
+"pending bytes: %llu\n"
+"data   offset: %llu\n"
+"data size: %llu\n"
+"data addr: 0x%lx\n",
+acc_vf_dev->mig_ctl->device_state,
+acc_vf_dev->mig_ctl->pending_bytes,
+acc_vf_dev->mig_ctl->data_offset,
+acc_vf_dev->mig_ctl->data_size,
+(uintptr_t)acc_vf_dev->vf_data);
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   return count;
+}
+
+static const struct file_operations acc_vf_debug_fops = {
+   .owner = THIS_MODULE,
+   .open = simple_open,
+   .read = acc_vf_debug_read,
+   .write = acc_vf_debug_write,
+};
+
+static ssize_t acc_vf_state_read(struct file *filp, char __user *buffer,
+  size_t count, loff_t *pos)
+{
+   struct acc_vf_migration *acc_vf_dev = filp->private_data;
+   char buf[VFIO_DEV_DBG_LEN];
+   u32 state;
+   int len;
+
+   if (!acc_vf_dev->mig_ctl) {
+   len = scnprintf(buf, VFIO_DEV_DBG_LEN, "%s\n", "Invalid\n");
+   } else {
+

[RFC PATCH 1/3] vfio/hisilicon: add acc live migration driver

2021-04-12 Thread Longfang Liu

This driver adds the code required by Hisilicon
accelerator device to realize the live migration function.
It mainly includes the following functions:
(1).Match the accelerator device with the vfio-pci
driver framework.
(2).Processing of the status of the live migration
function and processing of the migration data.
(3).Operation and data processing of accelerator hardware devices

Signed-off-by: Longfang Liu 
---
 drivers/vfio/pci/Kconfig  |8 +
 drivers/vfio/pci/Makefile |1 +
 drivers/vfio/pci/hisilicon/acc_vf_migration.c | 1144 +
 drivers/vfio/pci/hisilicon/acc_vf_migration.h |  168 
 4 files changed, 1321 insertions(+)
 create mode 100644 drivers/vfio/pci/hisilicon/acc_vf_migration.c
 create mode 100644 drivers/vfio/pci/hisilicon/acc_vf_migration.h

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 4abddbe..c1b181f 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -45,3 +45,11 @@ config VFIO_PCI_NVLINK2
depends on VFIO_PCI && PPC_POWERNV && SPAPR_TCE_IOMMU
help
  VFIO PCI support for P9 Witherspoon machine with NVIDIA V100 GPUs
+
+config VFIO_PCI_HISI_MIGRATION
+   bool "Support for Hisilicon Live Migaration"
+   depends on ARM64 && ACPI
+   select PCI_IOV
+   default y
+   help
+ Support for HiSilicon vfio pci to support VF live migration.
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index eff97a7..d17cec4 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -4,5 +4,6 @@ vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o 
vfio_pci_config.o
 vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o
 vfio-pci-$(CONFIG_S390) += vfio_pci_zdev.o
+vfio-pci-$(CONFIG_VFIO_PCI_HISI_MIGRATION) += hisilicon/acc_vf_migration.o
 
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
diff --git a/drivers/vfio/pci/hisilicon/acc_vf_migration.c 
b/drivers/vfio/pci/hisilicon/acc_vf_migration.c
new file mode 100644
index 000..5d8650d
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/acc_vf_migration.c
@@ -0,0 +1,1144 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 HiSilicon Limited. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "acc_vf_migration.h"
+
+#define VDM_OFFSET(x) offsetof(struct vfio_device_migration_info, x)
+void vfio_pci_hisilicon_acc_uninit(struct acc_vf_migration *acc_vf_dev);
+
+/* return 0 mailbox ready, -ETIMEDOUT hardware timeout */
+static int qm_wait_mb_ready(struct hisi_qm *qm)
+{
+   u32 val;
+
+   return readl_relaxed_poll_timeout(qm->io_base + QM_MB_CMD_SEND_BASE,
+   val, !((val >> QM_MB_BUSY_SHIFT) &
+   0x1), MB_POLL_PERIOD_US,
+   MB_POLL_TIMEOUT_US);
+}
+
+/* return 0 VM acc device ready, -ETIMEDOUT hardware timeout */
+static int qm_wait_dev_ready(struct hisi_qm *qm)
+{
+   u32 val;
+
+   return readl_relaxed_poll_timeout(qm->io_base + QM_VF_EQ_INT_MASK,
+   val, !(val & 0x1), MB_POLL_PERIOD_US,
+   MB_POLL_TIMEOUT_US);
+}
+
+
+/* 128 bit should be written to hardware at one time to trigger a mailbox */
+static void qm_mb_write(struct hisi_qm *qm, const void *src)
+{
+   void __iomem *fun_base = qm->io_base + QM_MB_CMD_SEND_BASE;
+   unsigned long tmp0 = 0;
+   unsigned long tmp1 = 0;
+
+   if (!IS_ENABLED(CONFIG_ARM64)) {
+   memcpy_toio(fun_base, src, 16);
+   wmb();
+   return;
+   }
+
+   asm volatile("ldp %0, %1, %3\n"
+"stp %0, %1, %2\n"
+"dsb sy\n"
+: "=" (tmp0),
+  "=" (tmp1),
+  "+Q" (*((char __iomem *)fun_base))
+: "Q" (*((char *)src))
+: "memory");
+}
+
+static void qm_mb_pre_init(struct qm_mailbox *mailbox, u8 cmd,
+  u16 queue, bool op)
+{
+   mailbox->w0 = cpu_to_le16(cmd |
+(op ? 0x1 << QM_MB_OP_SHIFT : 0) |
+(0x1 << QM_MB_BUSY_SHIFT));
+   mailbox->queue_num = cpu_to_le16(queue);
+   mailbox->rsvd = 0;
+}
+
+static int qm_mb_nolock(struct hisi_qm *qm, struct qm_mailbox *mailbox)
+{
+   int cnt = 0;
+
+   if (unlikely(qm_wait_mb_ready(qm))) {
+   dev_err(>pdev->dev, "QM mailbox is busy to start!\n");
+   return -EBUSY;
+   }
+
+   qm_mb_write(qm, mailbox);
+   while (true) {
+   if (!qm_wait_mb_ready(qm))
+   break;
+   if (++cnt > QM_MB_MAX_WAIT_CNT) {
+   dev_err(>pdev->dev, "QM mailbox operation 
timeout!\n");
+   return -EBUSY;
+   }
+

[RFC PATCH 2/3] vfio/hisilicon: register the driver to vfio

2021-04-12 Thread Longfang Liu

Register the live migration driver of the accelerator module to vfio

Signed-off-by: Longfang Liu 
---
 drivers/vfio/pci/vfio_pci.c | 11 +++
 drivers/vfio/pci/vfio_pci_private.h |  9 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 65e7e6b..e1b0e37 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -407,6 +407,17 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
}
}
 
+   if (pdev->vendor == PCI_VENDOR_ID_HUAWEI &&
+   IS_ENABLED(CONFIG_VFIO_PCI_HISI_MIGRATION)) {
+   ret = vfio_pci_hisilicon_acc_init(vdev);
+   if (ret && ret != -ENODEV) {
+   dev_warn(>pdev->dev,
+"Failed to setup Hisilicon ACC region\n");
+   vfio_pci_disable(vdev);
+   return ret;
+   }
+   }
+
vfio_pci_probe_mmaps(vdev);
 
return 0;
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 9cd1882..83c51be 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -214,6 +214,15 @@ static inline int vfio_pci_ibm_npu2_init(struct 
vfio_pci_device *vdev)
 }
 #endif
 
+#ifdef CONFIG_VFIO_PCI_HISI_MIGRATION
+extern int vfio_pci_hisilicon_acc_init(struct vfio_pci_device *vdev);
+#else
+static inline int vfio_pci_hisilicon_acc_init(struct vfio_pci_device *vdev)
+{
+   return -ENODEV;
+}
+#endif
+
 #ifdef CONFIG_S390
 extern int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
   struct vfio_info_cap *caps);
-- 
2.8.1

[RFC PATCH 0/3] vfio/hisilicon: add acc live migration driver

2021-04-12 Thread Longfang Liu

The live migration solution relies on the vfio_device_migration_info protocol.
The structure vfio_device_migration_info is placed at the 0th offset of
the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
migration information. Field accesses from this structure are only supported
at their native width and alignment. Otherwise, the result is undefined and
vendor drivers should return an error.

(1).The driver framework is based on vfio_pci_register_dev_region() of vfio-pci,
and then a new live migration region is added, and the live migration is
realized through the ops of this region.

(2).In order to ensure the compatibility of the devices before and after the
migration, the device compatibility information check will be performed in
the Pre-copy stage. If the check fails, an error will be returned and the
source VM will exit the migration function.

(3).After the compatibility check is passed, it will enter the Stop-and-copy
stage. At this time, all the live migration data will be copied, and then
saved to the VF device of the destination, and then the VF device of the
destination will be started and the VM of the source will be exited.

Longfang Liu (3):
  vfio/hisilicon: add acc live migration driver
  vfio/hisilicon: register the driver to vfio
  vfio/hisilicom: add debugfs for driver

 drivers/vfio/pci/Kconfig  |8 +
 drivers/vfio/pci/Makefile |1 +
 drivers/vfio/pci/hisilicon/acc_vf_migration.c | 1337 +
 drivers/vfio/pci/hisilicon/acc_vf_migration.h |  170 
 drivers/vfio/pci/vfio_pci.c   |   11 +
 drivers/vfio/pci/vfio_pci_private.h   |9 +
 6 files changed, 1536 insertions(+)
 create mode 100644 drivers/vfio/pci/hisilicon/acc_vf_migration.c
 create mode 100644 drivers/vfio/pci/hisilicon/acc_vf_migration.h

-- 
2.8.1

Re: [PATCH 4/6] usb: xhci-mtk: add support runtime PM

2021-04-12 Thread Chunfeng Yun

On Mon, 2021-04-12 at 13:14 +0800, Ikjoon Jang wrote:
> On Fri, Apr 9, 2021 at 4:54 PM Chunfeng Yun  wrote:
> >
> > On Fri, 2021-04-09 at 13:45 +0800, Ikjoon Jang wrote:
> > > On Thu, Apr 8, 2021 at 5:35 PM Chunfeng Yun  
> > > wrote:
> > > >
> > > > A dedicated wakeup irq will be used to handle runtime suspend/resume,
> > > > we use dev_pm_set_dedicated_wake_irq API to take care of requesting
> > > > and attaching wakeup irq, then the suspend/resume framework will help
> > > > to enable/disable wakeup irq.
> > > >
> > > > The runtime PM is default off since some platforms may not support it.
> > > > users can enable it via power/control (set "auto") in sysfs.
> > > >
> > > > Signed-off-by: Chunfeng Yun 
> > > > ---
> > > >  drivers/usb/host/xhci-mtk.c | 140 +++-
> > > >  1 file changed, 124 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/usb/host/xhci-mtk.c b/drivers/usb/host/xhci-mtk.c
> > > > index a74764ab914a..30927f4064d4 100644
> > > > --- a/drivers/usb/host/xhci-mtk.c
> > > > +++ b/drivers/usb/host/xhci-mtk.c
[...]
> > > >
> > > > +static int check_rhub_status(struct xhci_hcd *xhci, struct xhci_hub 
> > > > *rhub)
> > > > +{
> > > > +   u32 suspended_ports;
> > > > +   u32 status;
> > > > +   int num_ports;
> > > > +   int i;
> > > > +
> > > > +   num_ports = rhub->num_ports;
> > > > +   suspended_ports = rhub->bus_state.suspended_ports;
> > > > +   for (i = 0; i < num_ports; i++) {
> > > > +   if (!(suspended_ports & BIT(i))) {
> > > > +   status = readl(rhub->ports[i]->addr);
> > > > +   if (status & PORT_CONNECT)
> > >
> > > So this pm_runtime support is activated only when there's no devices
> > > connected at all?
> > No, if the connected devices also support runtime suspend, it will enter
> > suspend mode when no data transfer, then the controller can enter
> > suspend too
> > > I think this will always return -EBUSY with my board having an on-board 
> > > hub
> > > connected to both rhubs.
> > the on-board hub supports runtime suspend by default, so if no devices
> > connected, it will enter suspend
> 
> Sorry, you're correct. I was confused that the condition was
> (suspended && connect)
> My on-board hub connected to rhub is always in a suspended state
> whenever it's called.
> 
> However, I don't think this could return -EBUSY
> rpm_suspend() only be called when all the descendants are in sleep already.
You mean we can drop the bus check? 
If PM already takes care of children count, I think no need check it
anymore.
> Did you see any cases of this function returning -EBUSY or any concerns on 
> here?
No, I didn't see it before.

Thanks

> 
> 
> >
> > >
> > > > +   return -EBUSY;
> > > > +   }
> > > > +   }
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +/*
> > > > + * check the bus whether it could suspend or not
> > > > + * the bus will suspend if the downstream ports are already suspended,
> > > > + * or no devices connected.
> > > > + */
> > > > +static int check_bus_status(struct xhci_hcd *xhci)
> > > > +{
> > > > +   int ret;
> > > > +
> > > > +   ret = check_rhub_status(xhci, >usb3_rhub);
> > > > +   if (ret)
> > > > +   return ret;
> > > > +
> > > > +   return check_rhub_status(xhci, >usb2_rhub);
> > > > +}
> > > > +
> > > > +static int __maybe_unused xhci_mtk_runtime_suspend(struct device *dev)
> > > > +{
> > > > +   struct xhci_hcd_mtk  *mtk = dev_get_drvdata(dev);
> > > > +   struct xhci_hcd *xhci = hcd_to_xhci(mtk->hcd);
> > > > +   int ret = 0;
> > > > +
> > > > +   if (xhci->xhc_state)
> > > > +   return -ESHUTDOWN;
> > > > +
> > > > +   if (device_may_wakeup(dev)) {
> > > > +   ret = check_bus_status(xhci);
> > > > +   if (!ret)
> > > > +   ret = xhci_mtk_suspend(dev);
> > > > +   }
> > > > +
> > > > +   /* -EBUSY: let PM automatically reschedule another autosuspend 
> > > > */
> > > > +   return ret ? -EBUSY : 0;
> > > > +}
> > > > +
> > > > +static int __maybe_unused xhci_mtk_runtime_resume(struct device *dev)
> > > > +{
> > > > +   struct xhci_hcd_mtk  *mtk = dev_get_drvdata(dev);
> > > > +   struct xhci_hcd *xhci = hcd_to_xhci(mtk->hcd);
> > > > +   int ret = 0;
> > > > +
> > > > +   if (xhci->xhc_state)
> > > > +   return -ESHUTDOWN;
> > > > +
> > > > +   if (device_may_wakeup(dev))
> > > > +   ret = xhci_mtk_resume(dev);
> > > > +
> > > > +   return ret;
> > > > +}
> > > > +
> > > >  static const struct dev_pm_ops xhci_mtk_pm_ops = {
> > > > SET_SYSTEM_SLEEP_PM_OPS(xhci_mtk_suspend, xhci_mtk_resume)
> > > > +   SET_RUNTIME_PM_OPS(xhci_mtk_runtime_suspend,
> > > > +  xhci_mtk_runtime_resume, NULL)
> > > >  };
> > > > -#define DEV_PM_OPS IS_ENABLED(CONFIG_PM) ? _mtk_pm_ops : NULL
> >

Re: [PATCH 5.10 000/188] 5.10.30-rc1 review

2021-04-12 Thread Guenter Roeck

On Mon, Apr 12, 2021 at 10:38:34AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.30 release.
> There are 188 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 14 Apr 2021 08:39:44 +.
> Anything received after that time might be too late.
> 

Build results:
total: 156 pass: 156 fail: 0
Qemu test results:
total: 454 pass: 454 fail: 0

Tested-by: Guenter Roeck 

Guenter

Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc

2021-04-12 Thread Yunsheng Lin

On 2021/4/13 11:26, Hillf Danton wrote:
> On Tue, 13 Apr 2021 10:56:42 Yunsheng Lin wrote:
>> On 2021/4/13 10:21, Hillf Danton wrote:
>>> On Mon, 12 Apr 2021 20:00:43  Yunsheng Lin wrote:

 Yes, the below patch seems to fix the data race described in
 the commit log.
 Then what is the difference between my patch and your patch below:)
>>>
>>> Hehe, this is one of the tough questions over a bounch of weeks.
>>>
>>> If a seqcount can detect the race between skb enqueue and dequeue then we
>>> cant see any excuse for not rolling back to the point without NOLOCK.
>>
>> I am not sure I understood what you meant above.
>>
>> As my understanding, the below patch is essentially the same as
>> your previous patch, the only difference I see is it uses qdisc->pad
>> instead of __QDISC_STATE_NEED_RESCHEDULE.
>>
>> So instead of proposing another patch, it would be better if you
>> comment on my patch, and make improvement upon that.
>>
> Happy to do that after you show how it helps revert NOLOCK.

Actually I am not going to revert NOLOCK, but add optimization
to it if the patch fixes the packet stuck problem.

Is there any reason why you want to revert it?

> 
> .
>

Re: [PATCH v1 1/2] dt-bindings: mmc: sdhci-of-aspeed: Add power-gpio and power-switch-gpio

2021-04-12 Thread Steven Lee

The 04/12/2021 15:38, Ulf Hansson wrote:
> On Thu, 8 Apr 2021 at 03:52, Steven Lee  wrote:
> >
> > AST2600-A2 EVB provides the reference design for enabling SD bus power
> > and toggling SD bus signal voltage by GPIO pins.
> > Add the definition and example for power-gpio and power-switch-gpio
> > properties.
> >
> > In the reference design, GPIOV0 of AST2600-A2 EVB is connected to power
> > load switch that providing 3.3v to SD1 bus vdd. GPIOV1 is connected to
> > a 1.8v and a 3.3v power load switch that providing signal voltage to
> > SD1 bus.
> > If GPIOV0 is active high, SD1 bus is enabled. Otherwise, SD1 bus is
> > disabled.
> > If GPIOV1 is active high, 3.3v power load switch is enabled, SD1 signal
> > voltage is 3.3v. Otherwise, 1.8v power load switch will be enabled, SD1
> > signal voltage becomes 1.8v.
> >
> > AST2600-A2 EVB also support toggling signal voltage for SD2 bus.
> > The design is the same as SD1 bus. It uses GPIOV2 as power-gpio and GPIOV3
> > as power-switch-gpio.
> 
> Thanks for sharing the details, it certainly helps while reviewing.
> 
> >
> > Signed-off-by: Steven Lee 
> > ---
> >  .../devicetree/bindings/mmc/aspeed,sdhci.yaml | 25 +++
> >  1 file changed, 25 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml 
> > b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> > index 987b287f3bff..515a74614f3c 100644
> > --- a/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> > +++ b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> > @@ -37,6 +37,14 @@ properties:
> >clocks:
> >  maxItems: 1
> >  description: The SD/SDIO controller clock gate
> > +  power-gpio:
> > +description:
> > +  The GPIO for enabling/disabling SD bus power.
> > +maxItems: 1
> > +  power-switch-gpio:
> > +description:
> > +  The GPIO for toggling the signal voltage between 3.3v and 1.8v.
> > +maxItems: 1
> 
> 
> >
> >  patternProperties:
> >"^sdhci@[0-9a-f]+$":
> > @@ -61,6 +69,14 @@ patternProperties:
> >sdhci,auto-cmd12:
> >  type: boolean
> >  description: Specifies that controller should use auto CMD12
> > +  power-gpio:
> > +description:
> > +  The GPIO for enabling/disabling SD bus power.
> > +maxItems: 1
> > +  power-switch-gpio:
> > +description:
> > +  The GPIO for toggling the signal voltage between 3.3v and 1.8v.
> > +maxItems: 1
> >  required:
> 
> Please do not model these as GPIO pins like this. Instead, it's better
> to model them as gpio regulators, since the mmc core manages them as
> regulators.
> 
> We have a vmmc regulator (corresponding to vdd) and a vqmmc regulator
> (corresponding the signal-voltage level). These are also described in
> the common mmc DT bindings, see
> Documentation/devicetree/bindings/mmc/mmc-controller.yaml.
> 

Thanks for the information. I will modify it.

> >- compatible
> >- reg
> > @@ -80,6 +96,7 @@ required:
> >  examples:
> >- |
> >  #include 
> > +#include 
> >  sdc@1e74 {
> >  compatible = "aspeed,ast2500-sd-controller";
> >  reg = <0x1e74 0x100>;
> > @@ -94,6 +111,10 @@ examples:
> >  interrupts = <26>;
> >  sdhci,auto-cmd12;
> >  clocks = < ASPEED_CLK_SDIO>;
> > +power-gpio = < ASPEED_GPIO(V, 0)
> > + GPIO_ACTIVE_HIGH>;
> > +power-switch-gpio = < ASPEED_GPIO(V, 1)
> > + GPIO_ACTIVE_HIGH>;
> >  };
> >
> >  sdhci1: sdhci@200 {
> > @@ -102,5 +123,9 @@ examples:
> >  interrupts = <26>;
> >  sdhci,auto-cmd12;
> >  clocks = < ASPEED_CLK_SDIO>;
> > +power-gpio = < ASPEED_GPIO(V, 2)
> > + GPIO_ACTIVE_HIGH>;
> > +power-switch-gpio = < ASPEED_GPIO(V, 3)
> > + GPIO_ACTIVE_HIGH>;
> >  };
> >  };
> 
> Kind regards
> Uffe

Re: [PATCH 1/2] drm/msm/dp: check sink_count before update is_connected status

2021-04-12 Thread Stephen Boyd

Quoting Kuogee Hsieh (2021-04-12 10:02:51)
> At pm_resume check link sisnk_count before update is_connected status
> base on HPD real time link status. Also print out error message only
> when either EV_CONNECT_PENDING_TIMEOUT or EV_DISCONNECT_PENDING_TIMEOUT
> happen.
> 
> Signed-off-by: Kuogee Hsieh 
> ---

Also please include

Reported-by: Stephen Boyd 

in the next post.

Re: [PATCH 1/2] drm/msm/dp: check sink_count before update is_connected status

2021-04-12 Thread Stephen Boyd

Quoting Kuogee Hsieh (2021-04-12 10:02:51)
> At pm_resume check link sisnk_count before update is_connected status

s/sisnk_count/sink_count/

> base on HPD real time link status. Also print out error message only
> when either EV_CONNECT_PENDING_TIMEOUT or EV_DISCONNECT_PENDING_TIMEOUT
> happen.

Can you add "why"? I think the why is something like "link status is
different from display connected status in the case of something like an
Apple dongle where the type-c plug can be connected, and therefore the
link is connected, but no sink is connected until an HDMI cable is
plugged into the dongle". This still doesn't explain why it's important
to check at resume time though.

> 
> Signed-off-by: Kuogee Hsieh 
> ---

Any Fixes tag?

>  drivers/gpu/drm/msm/dp/dp_display.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c 
> b/drivers/gpu/drm/msm/dp/dp_display.c
> index 5a39da6..4992a049 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -587,7 +587,7 @@ static int dp_connect_pending_timeout(struct 
> dp_display_private *dp, u32 data)
>  
> state = dp->hpd_state;
> if (state == ST_CONNECT_PENDING) {
> -   dp_display_enable(dp, 0);
> +   DRM_ERROR("EV_CONNECT_PENDING_TIMEOUT error\n");

Can we get rid of these messages?

> dp->hpd_state = ST_CONNECTED;
> }
>  
> @@ -670,7 +670,7 @@ static int dp_disconnect_pending_timeout(struct 
> dp_display_private *dp, u32 data
>  
> state =  dp->hpd_state;
> if (state == ST_DISCONNECT_PENDING) {
> -   dp_display_disable(dp, 0);
> +   DRM_ERROR("EV_DISCONNECT_PENDING_TIMEOUT error\n");

And this one? If it happens it will just sit in the logs when probably
the user can't do anything about it. Timeouts are just a fact of life.

> dp->hpd_state = ST_DISCONNECTED;
> }
>  
> @@ -1272,7 +1272,7 @@ static int dp_pm_resume(struct device *dev)
>  
> status = dp_catalog_link_is_connected(dp->catalog);
>  
> -   if (status)
> +   if (status && dp->link->sink_count)

Can we add a comment above this if? Otherwise it doesn't make much
sense why sink_count is important.

/*
 * Only consider the display as connected, and send a connected
 * notification to userspace in
 * dp_display_send_hpd_notification(), if there's actually a
 * sink connected. Otherwise, the link could be up/connected or 
 * in the process of being established, but there isn't actually
 * anything to display to on the other side yet.
 */

> dp->dp_display.is_connected = true;
> else
> dp->dp_display.is_connected = false;

Re: [PATCH v3] f2fs: fix to keep isolation of atomic write

2021-04-12 Thread Jaegeuk Kim

On 04/12, Chao Yu wrote:
> As Yi Chen reported, there is a potential race case described as below:
> 
> Thread A  Thread B
> - f2fs_ioc_start_atomic_write
>   - mkwrite
>- set_page_dirty
> - f2fs_set_page_private(page, 0)
>  - set_inode_flag(FI_ATOMIC_FILE)
>   - mkwrite same page
>- set_page_dirty
> - f2fs_register_inmem_page
>  - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE)
>failed due to PagePrivate flag has been set
>  - list_add_tail
>   - truncate_inode_pages
>- f2fs_invalidate_page
> - clear page private but w/o remove it from
>   inmem_list
>- set page->mapping to NULL
> - f2fs_ioc_commit_atomic_write
>  - __f2fs_commit_inmem_pages
>- __revoke_inmem_pages
> - f2fs_put_page panic as page->mapping is NULL
> 
> The root cause is we missed to keep isolation of atomic write in the case
> of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem
> lock to avoid this issue.

My only concern is performance regression. Could you please verify the numbers?

> 
> Reported-by: Yi Chen 
> Signed-off-by: Chao Yu 
> ---
> v3:
> - rebase to last dev branch
> - update commit message because this patch fixes a different racing issue
> of atomic write
>  fs/f2fs/file.c| 3 +++
>  fs/f2fs/segment.c | 6 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index d697c8900fa7..6284b2f4a60b 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   goto out;
>  
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   /*
>* Should wait end_io to count F2FS_WB_CP_DATA correctly by
> @@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
> inode->i_ino, get_dirty_pages(inode));
>   ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
>   if (ret) {
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   goto out;
>   }
> @@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file 
> *filp)
>   /* add inode in inmem_list first and set atomic_file */
>   set_inode_flag(inode, FI_ATOMIC_FILE);
>   clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>  
>   f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 0cb1ca88d4aa..78c8342f52fd 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   struct f2fs_inode_info *fi = F2FS_I(inode);
>  
>   do {
> + down_write(_I(inode)->i_mmap_sem);
>   mutex_lock(>inmem_lock);
>   if (list_empty(>inmem_pages)) {
>   fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> @@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode)
>   spin_unlock(>inode_lock[ATOMIC_FILE]);
>  
>   mutex_unlock(>inmem_lock);
> + up_write(_I(inode)->i_mmap_sem);
>   break;
>   }
>   __revoke_inmem_pages(inode, >inmem_pages,
>   true, false, true);
>   mutex_unlock(>inmem_lock);
> + up_write(_I(inode)->i_mmap_sem);
>   } while (1);
>  }
>  
> @@ -468,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   f2fs_balance_fs(sbi, true);
>  
>   down_write(>i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   f2fs_lock_op(sbi);
>   set_inode_flag(inode, FI_ATOMIC_COMMIT);
> @@ -479,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode)
>   clear_inode_flag(inode, FI_ATOMIC_COMMIT);
>  
>   f2fs_unlock_op(sbi);
> +
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(>i_gc_rwsem[WRITE]);
>  
>   return err;
> -- 
> 2.29.2

[PATCH v5 3/3] staging: rtl8192e: remove casts and parentheses

2021-04-12 Thread Mitali Borkar

Removed unnecessary (void *) cast and parentheses to meet linux kernel
coding style.

Signed-off-by: Mitali Borkar 
---
 
Changes from v4:- Removed unnecessary casts and parentheses.
Changes from v3:- No changes.
Changes from v2:- Rectified spelling mistake in subject description.
Changes has been made in v3.
Changes from v1:- No changes.

 drivers/staging/rtl8192e/rtl819x_HTProc.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/rtl8192e/rtl819x_HTProc.c 
b/drivers/staging/rtl8192e/rtl819x_HTProc.c
index 431202927036..ec6b46166e84 100644
--- a/drivers/staging/rtl8192e/rtl819x_HTProc.c
+++ b/drivers/staging/rtl8192e/rtl819x_HTProc.c
@@ -646,14 +646,10 @@ void HTInitializeHTInfo(struct rtllib_device *ieee)
pHTInfo->CurrentMPDUDensity = pHTInfo->MPDU_Density;
pHTInfo->CurrentAMPDUFactor = pHTInfo->AMPDU_Factor;
 
-   memset((void *)(>SelfHTCap), 0,
-  sizeof(pHTInfo->SelfHTCap));
-   memset((void *)(>SelfHTInfo), 0,
-  sizeof(pHTInfo->SelfHTInfo));
-   memset((void *)(>PeerHTCapBuf), 0,
-  sizeof(pHTInfo->PeerHTCapBuf));
-   memset((void *)(>PeerHTInfoBuf), 0,
-  sizeof(pHTInfo->PeerHTInfoBuf));
+   memset(>SelfHTCap, 0, sizeof(pHTInfo->SelfHTCap));
+   memset(>SelfHTInfo, 0, sizeof(pHTInfo->SelfHTInfo));
+   memset(>PeerHTCapBuf, 0, sizeof(pHTInfo->PeerHTCapBuf));
+   memset(>PeerHTInfoBuf, 0, sizeof(pHTInfo->PeerHTInfoBuf));
 
pHTInfo->bSwBwInProgress = false;
 
-- 
2.30.2

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2243 matches

Mail list logo