[PATCH V8 2/4] phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips

2017-04-05 Thread Vivek Gautam
PHY transceiver driver for QUSB2 phy controller that provides
HighSpeed functionality for DWC3 controller present on
Qualcomm chipsets.

Signed-off-by: Vivek Gautam 
Reviewed-by: Stephen Boyd 
---

Changes since v7:
 - Fixed 'checkpatch --strict' alignment warnings/checks.

Changes since v6:
 - Dropped 'vdd-phy' from list of regulators.
 - Rebased on phy/next and *not* including phy grouping series.

Changes since v5:
 - Rebased on top of phy grouping series. So the driver now sits in
   drivers/phy/qualcomm/
 - Removed instances of readl_relaxed() and writel_relaxed(), and using
   readl() and writel() instead.
 - Replaced regulator handling with regulator_bulk() apis, so that
   qusb2_phy_toggle_power() method is completely dropped.
 - Removed memory barriers from the driver. Instead, using readl() over
   the register to ensure that the write is through to the device.
 - Fixed nits about return statement from probe() and qusb2_phy_poweron().

Changes since v4:
 - Updated the copyright year to 2017.
 - Removed unnecessary of_match_ptr() cast for the match table,
   since the driver is compiled for CONFIG_OF.

Changes since v3:
 - Added 'Reviewed-by' from Stephen.
 - Fixed debug message for qusb2_phy_set_tune2_param().
 - Replaced devm_reset_control_get() with devm_reset_control_get_by_index()
   since we are requesting only one reset.
 - Updated devm_nvmem_cell_get() with a NULL cell id.
 - Made error labels more idiomatic.
 - Refactored qusb2_setbits() and qusb2_clrbits() a little bit to accept
   base address and register offset as two separate arguments.

Changes since v2:
 - Removed selecting 'RESET_CONTROLLER' config.
 - Added error handling for clk_prepare_enable paths.
 - Removed explicitly setting ref_clk rate to 19.2 MHz. Don't need to
   do that since 'xo' is modeled as parent to this clock.
 - Removed 'ref_clk_src' handling. Driver doesn't need to request and
   handle this clock.
 - Moved nvmem_cell_get() to probe function.
 - Simplified phy pll status handling.
 - Using of_device_get_match_data() to get match data.
 - Uniformly using lowercase for hex numbers.
 - Fixed sparse warnings.
 - Using shorter variable names in structure and in functions.
 - Handling various comment style shortcomings.

Changes since v1:
 - removed reference to clk_enabled/pwr_enabled.
 - moved clock and regulator enable code to phy_power_on/off() callbacks.
 - fixed return on EPROBE_DEFER in qusb2_phy_probe().
 - fixed phy create and phy register ordering.
 - removed references to non-lkml links from commit message.
 - took care of other minor nits.
 - Fixed coccinelle warnings -
   'PTR_ERR applied after initialization to constant'
 - Addressed review comment, regarding qfprom access for tune2 param value.
   This driver is now based on qfprom patch[1] that allows byte access now.

 drivers/phy/Kconfig  |  10 +
 drivers/phy/Makefile |   1 +
 drivers/phy/phy-qcom-qusb2.c | 493 +++
 3 files changed, 504 insertions(+)
 create mode 100644 drivers/phy/phy-qcom-qusb2.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index dc5277ad1b5a..ccc9178e32cd 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -439,6 +439,16 @@ config PHY_STIH407_USB
  Enable this support to enable the picoPHY device used by USB2
  and USB3 controllers on STMicroelectronics STiH407 SoC families.
 
+config PHY_QCOM_QUSB2
+   tristate "Qualcomm QUSB2 PHY Driver"
+   depends on OF && (ARCH_QCOM || COMPILE_TEST)
+   select GENERIC_PHY
+   help
+ Enable this to support the HighSpeed QUSB2 PHY transceiver for USB
+ controllers on Qualcomm chips. This driver supports the high-speed
+ PHY which is usually paired with either the ChipIdea or Synopsys DWC3
+ USB IPs on MSM SOCs.
+
 config PHY_QCOM_UFS
tristate "Qualcomm UFS PHY driver"
depends on OF && ARCH_QCOM
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index e7b0feb1e125..0375c6a32697 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_PHY_ST_SPEAR1310_MIPHY)  += phy-spear1310-miphy.o
 obj-$(CONFIG_PHY_ST_SPEAR1340_MIPHY)   += phy-spear1340-miphy.o
 obj-$(CONFIG_PHY_XGENE)+= phy-xgene.o
 obj-$(CONFIG_PHY_STIH407_USB)  += phy-stih407-usb.o
+obj-$(CONFIG_PHY_QCOM_QUSB2)   += phy-qcom-qusb2.o
 obj-$(CONFIG_PHY_QCOM_UFS) += phy-qcom-ufs.o
 obj-$(CONFIG_PHY_QCOM_UFS) += phy-qcom-ufs-qmp-20nm.o
 obj-$(CONFIG_PHY_QCOM_UFS) += phy-qcom-ufs-qmp-14nm.o
diff --git a/drivers/phy/phy-qcom-qusb2.c b/drivers/phy/phy-qcom-qusb2.c
new file mode 100644
index ..6c575244c0fb
--- /dev/null
+++ b/drivers/phy/phy-qcom-qusb2.c
@@ -0,0 +1,493 @@
+/*
+ * Copyright (c) 2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms 

[PATCH V8 4/4] phy: qcom-qmp: new qmp phy driver for qcom-chipsets

2017-04-05 Thread Vivek Gautam
Qualcomm SOCs have QMP phy controller that provides support
to a number of controller, viz. PCIe, UFS, and USB.
Add a new driver, based on generic phy framework, for this
phy controller.

Signed-off-by: Vivek Gautam 
Tested-by: Srinivas Kandagatla 
Reviewed-by: Stephen Boyd 
---

Changes since v7:
 - Fixed 'checkpatch --strict' alignment warnings/checks.
 - Added Stephen's Reviewed-by tag.

Changes since v6:
 - Rebased on phy/next and *not* including phy grouping series.

Changes since v5:
 - Rebased on top of phy grouping series. So the driver now sits in
   drivers/phy/qualcomm/
 - Removed instances of readl_relaxed() and writel_relaxed(), and using
   readl() and writel() instead.
 - Replaced regulator handling with regulator_bulk() apis, so that
   qusb2_phy_toggle_power() method is completely dropped.
 - Removed memory barriers from the driver. Instead, added an extra readl()
   over the register in qphy_setbits() and qphy_clrbits() to ensure that
   the write is through to the device.
 - Fixed nits about return statement from probe(), phy_pipe_clk_register()
   and qcom_qmp_phy_create().

Changes since v4:
 - Added provision for child nodes representing each phy lane.
   Each of these nodes have their own register space for tx, rx and pcs
   blocks. Added provision in qcom_qmp_phy_create() to iomap these
   address spaces.
 - Added list of clocks and resets that are mandatory for each phy.
   qcom_qmp_phy_clk_init(), and qcom_qmp_phy_reset_init() methods
   request this list and maintains it with qmp.
   The clocks and resets are then enabled/de-asserted based on this list.
   This list is also updated in the binding documentation.
 - Removed qcom_qmp_phy_xlate() method as we don't need it with
   #phy-cells 0.
 - Removed unnecessary of_match_ptr() cast for the match table,
   since the driver is compiled for CONFIG_OF.
 - Updated copyright year to 2017.

Changes since v3:
 - Renamed 'struct qcom_qmp_phy' to 'struct qcom_qmp' and
   'struct qmp_phy_desc' to 'struct qmp_phy' to avoid any confusion
   in distinguishing between QMP phy block and per-lane phy which is
   the actual phy in Linux eyes (suggested by Bjorn Andersson).
 - Made error labels more idiomatic.
 - Modified status checking for phy pcs.
 - Fixed power_down_delay check.
 - Refactored phy_pipe_clk_register() to register the pipe clock source
   using devm_clk_hw_register() (suggested by Stephen).
 - qcom_qmp_phy_xlate() function:
   - Removed unnecessary 'for loop'.
   - Added additional check for '0' or -ve args_count.
 - Fixed the mixed tabs and spaces in pipe_clk_src diagram.
 - Removed instances of memset() since we use snprintf() for the
   buffers.
 - Refactored qphy_setbits() and qphy_clrbits() a little bit to accept
   base address and register offset as two separate arguments.

Changes since v2:
 - Removed selecting 'RESET_CONTROLLER' config.
 - Added error handling for clk_prepare_enable paths.
 - Removed 'ref_clk_src' handling. Driver doesn't need to request and
   handle this clock.
 - Using readl_poll_timeout() to simplify pcs ready status polling.
   Also fixed the polling condition for pcs block ready status:
   'Common block ready status bit is set on phy init completion, while
   PCS block ready status bit (PHYSTATUS) is reset on phy init
   completion.'
 - Moved out the per-lane phy creation from probe() to separate
   function.
 - Registering pipe clock source as a fixed rate clock that comes
   out of the PLL block of QMP phy. These source clocks serve as
   parent to 'pipe_clks' that are requested by pcie or usb3 phys.
 - Using of_device_get_match_data() to get match data.
 - Fixed sparse warnings for 'static' and 'const'.
 - Using shorter variable names in structure and in functions.
 - Handling various comment style shortcomings.

Changes since v1:
 - Fixed missing mutex_unlock() calls in error cases, reported by
   Julia Lawall.
 - Selecting CONFIG_RESET_CONTROLLER when this driver is enabled.
 - Added a boolean property to check if the phy has individual lane
   reset available.
 - Took care or EPROBE_DEFER, dev_vdbg() and other minor nits.
 - Removed references to non-lkml links from commit message.
 - Moved to use separate iomem resources for each lanes.
   Tx, Rx and PCS offsets per lane can now come from dt bindings.

 drivers/phy/Kconfig|8 +
 drivers/phy/Makefile   |1 +
 drivers/phy/phy-qcom-qmp.c | 1153 
 3 files changed, 1162 insertions(+)
 create mode 100644 drivers/phy/phy-qcom-qmp.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index ccc9178e32cd..bb8140355608 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -439,6 +439,14 @@ config PHY_STIH407_USB
  Enable this support to enable the picoPHY device used by USB2
  and USB3 controllers on STMicroelectronics STiH407 SoC families.
 
+config PHY_QCOM_QMP
+   tristate 

[PATCH V8 2/4] phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips

2017-04-05 Thread Vivek Gautam
PHY transceiver driver for QUSB2 phy controller that provides
HighSpeed functionality for DWC3 controller present on
Qualcomm chipsets.

Signed-off-by: Vivek Gautam 
Reviewed-by: Stephen Boyd 
---

Changes since v7:
 - Fixed 'checkpatch --strict' alignment warnings/checks.

Changes since v6:
 - Dropped 'vdd-phy' from list of regulators.
 - Rebased on phy/next and *not* including phy grouping series.

Changes since v5:
 - Rebased on top of phy grouping series. So the driver now sits in
   drivers/phy/qualcomm/
 - Removed instances of readl_relaxed() and writel_relaxed(), and using
   readl() and writel() instead.
 - Replaced regulator handling with regulator_bulk() apis, so that
   qusb2_phy_toggle_power() method is completely dropped.
 - Removed memory barriers from the driver. Instead, using readl() over
   the register to ensure that the write is through to the device.
 - Fixed nits about return statement from probe() and qusb2_phy_poweron().

Changes since v4:
 - Updated the copyright year to 2017.
 - Removed unnecessary of_match_ptr() cast for the match table,
   since the driver is compiled for CONFIG_OF.

Changes since v3:
 - Added 'Reviewed-by' from Stephen.
 - Fixed debug message for qusb2_phy_set_tune2_param().
 - Replaced devm_reset_control_get() with devm_reset_control_get_by_index()
   since we are requesting only one reset.
 - Updated devm_nvmem_cell_get() with a NULL cell id.
 - Made error labels more idiomatic.
 - Refactored qusb2_setbits() and qusb2_clrbits() a little bit to accept
   base address and register offset as two separate arguments.

Changes since v2:
 - Removed selecting 'RESET_CONTROLLER' config.
 - Added error handling for clk_prepare_enable paths.
 - Removed explicitly setting ref_clk rate to 19.2 MHz. Don't need to
   do that since 'xo' is modeled as parent to this clock.
 - Removed 'ref_clk_src' handling. Driver doesn't need to request and
   handle this clock.
 - Moved nvmem_cell_get() to probe function.
 - Simplified phy pll status handling.
 - Using of_device_get_match_data() to get match data.
 - Uniformly using lowercase for hex numbers.
 - Fixed sparse warnings.
 - Using shorter variable names in structure and in functions.
 - Handling various comment style shortcomings.

Changes since v1:
 - removed reference to clk_enabled/pwr_enabled.
 - moved clock and regulator enable code to phy_power_on/off() callbacks.
 - fixed return on EPROBE_DEFER in qusb2_phy_probe().
 - fixed phy create and phy register ordering.
 - removed references to non-lkml links from commit message.
 - took care of other minor nits.
 - Fixed coccinelle warnings -
   'PTR_ERR applied after initialization to constant'
 - Addressed review comment, regarding qfprom access for tune2 param value.
   This driver is now based on qfprom patch[1] that allows byte access now.

 drivers/phy/Kconfig  |  10 +
 drivers/phy/Makefile |   1 +
 drivers/phy/phy-qcom-qusb2.c | 493 +++
 3 files changed, 504 insertions(+)
 create mode 100644 drivers/phy/phy-qcom-qusb2.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index dc5277ad1b5a..ccc9178e32cd 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -439,6 +439,16 @@ config PHY_STIH407_USB
  Enable this support to enable the picoPHY device used by USB2
  and USB3 controllers on STMicroelectronics STiH407 SoC families.
 
+config PHY_QCOM_QUSB2
+   tristate "Qualcomm QUSB2 PHY Driver"
+   depends on OF && (ARCH_QCOM || COMPILE_TEST)
+   select GENERIC_PHY
+   help
+ Enable this to support the HighSpeed QUSB2 PHY transceiver for USB
+ controllers on Qualcomm chips. This driver supports the high-speed
+ PHY which is usually paired with either the ChipIdea or Synopsys DWC3
+ USB IPs on MSM SOCs.
+
 config PHY_QCOM_UFS
tristate "Qualcomm UFS PHY driver"
depends on OF && ARCH_QCOM
diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
index e7b0feb1e125..0375c6a32697 100644
--- a/drivers/phy/Makefile
+++ b/drivers/phy/Makefile
@@ -50,6 +50,7 @@ obj-$(CONFIG_PHY_ST_SPEAR1310_MIPHY)  += phy-spear1310-miphy.o
 obj-$(CONFIG_PHY_ST_SPEAR1340_MIPHY)   += phy-spear1340-miphy.o
 obj-$(CONFIG_PHY_XGENE)+= phy-xgene.o
 obj-$(CONFIG_PHY_STIH407_USB)  += phy-stih407-usb.o
+obj-$(CONFIG_PHY_QCOM_QUSB2)   += phy-qcom-qusb2.o
 obj-$(CONFIG_PHY_QCOM_UFS) += phy-qcom-ufs.o
 obj-$(CONFIG_PHY_QCOM_UFS) += phy-qcom-ufs-qmp-20nm.o
 obj-$(CONFIG_PHY_QCOM_UFS) += phy-qcom-ufs-qmp-14nm.o
diff --git a/drivers/phy/phy-qcom-qusb2.c b/drivers/phy/phy-qcom-qusb2.c
new file mode 100644
index ..6c575244c0fb
--- /dev/null
+++ b/drivers/phy/phy-qcom-qusb2.c
@@ -0,0 +1,493 @@
+/*
+ * Copyright (c) 2017, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ 

[PATCH V8 4/4] phy: qcom-qmp: new qmp phy driver for qcom-chipsets

2017-04-05 Thread Vivek Gautam
Qualcomm SOCs have QMP phy controller that provides support
to a number of controller, viz. PCIe, UFS, and USB.
Add a new driver, based on generic phy framework, for this
phy controller.

Signed-off-by: Vivek Gautam 
Tested-by: Srinivas Kandagatla 
Reviewed-by: Stephen Boyd 
---

Changes since v7:
 - Fixed 'checkpatch --strict' alignment warnings/checks.
 - Added Stephen's Reviewed-by tag.

Changes since v6:
 - Rebased on phy/next and *not* including phy grouping series.

Changes since v5:
 - Rebased on top of phy grouping series. So the driver now sits in
   drivers/phy/qualcomm/
 - Removed instances of readl_relaxed() and writel_relaxed(), and using
   readl() and writel() instead.
 - Replaced regulator handling with regulator_bulk() apis, so that
   qusb2_phy_toggle_power() method is completely dropped.
 - Removed memory barriers from the driver. Instead, added an extra readl()
   over the register in qphy_setbits() and qphy_clrbits() to ensure that
   the write is through to the device.
 - Fixed nits about return statement from probe(), phy_pipe_clk_register()
   and qcom_qmp_phy_create().

Changes since v4:
 - Added provision for child nodes representing each phy lane.
   Each of these nodes have their own register space for tx, rx and pcs
   blocks. Added provision in qcom_qmp_phy_create() to iomap these
   address spaces.
 - Added list of clocks and resets that are mandatory for each phy.
   qcom_qmp_phy_clk_init(), and qcom_qmp_phy_reset_init() methods
   request this list and maintains it with qmp.
   The clocks and resets are then enabled/de-asserted based on this list.
   This list is also updated in the binding documentation.
 - Removed qcom_qmp_phy_xlate() method as we don't need it with
   #phy-cells 0.
 - Removed unnecessary of_match_ptr() cast for the match table,
   since the driver is compiled for CONFIG_OF.
 - Updated copyright year to 2017.

Changes since v3:
 - Renamed 'struct qcom_qmp_phy' to 'struct qcom_qmp' and
   'struct qmp_phy_desc' to 'struct qmp_phy' to avoid any confusion
   in distinguishing between QMP phy block and per-lane phy which is
   the actual phy in Linux eyes (suggested by Bjorn Andersson).
 - Made error labels more idiomatic.
 - Modified status checking for phy pcs.
 - Fixed power_down_delay check.
 - Refactored phy_pipe_clk_register() to register the pipe clock source
   using devm_clk_hw_register() (suggested by Stephen).
 - qcom_qmp_phy_xlate() function:
   - Removed unnecessary 'for loop'.
   - Added additional check for '0' or -ve args_count.
 - Fixed the mixed tabs and spaces in pipe_clk_src diagram.
 - Removed instances of memset() since we use snprintf() for the
   buffers.
 - Refactored qphy_setbits() and qphy_clrbits() a little bit to accept
   base address and register offset as two separate arguments.

Changes since v2:
 - Removed selecting 'RESET_CONTROLLER' config.
 - Added error handling for clk_prepare_enable paths.
 - Removed 'ref_clk_src' handling. Driver doesn't need to request and
   handle this clock.
 - Using readl_poll_timeout() to simplify pcs ready status polling.
   Also fixed the polling condition for pcs block ready status:
   'Common block ready status bit is set on phy init completion, while
   PCS block ready status bit (PHYSTATUS) is reset on phy init
   completion.'
 - Moved out the per-lane phy creation from probe() to separate
   function.
 - Registering pipe clock source as a fixed rate clock that comes
   out of the PLL block of QMP phy. These source clocks serve as
   parent to 'pipe_clks' that are requested by pcie or usb3 phys.
 - Using of_device_get_match_data() to get match data.
 - Fixed sparse warnings for 'static' and 'const'.
 - Using shorter variable names in structure and in functions.
 - Handling various comment style shortcomings.

Changes since v1:
 - Fixed missing mutex_unlock() calls in error cases, reported by
   Julia Lawall.
 - Selecting CONFIG_RESET_CONTROLLER when this driver is enabled.
 - Added a boolean property to check if the phy has individual lane
   reset available.
 - Took care or EPROBE_DEFER, dev_vdbg() and other minor nits.
 - Removed references to non-lkml links from commit message.
 - Moved to use separate iomem resources for each lanes.
   Tx, Rx and PCS offsets per lane can now come from dt bindings.

 drivers/phy/Kconfig|8 +
 drivers/phy/Makefile   |1 +
 drivers/phy/phy-qcom-qmp.c | 1153 
 3 files changed, 1162 insertions(+)
 create mode 100644 drivers/phy/phy-qcom-qmp.c

diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
index ccc9178e32cd..bb8140355608 100644
--- a/drivers/phy/Kconfig
+++ b/drivers/phy/Kconfig
@@ -439,6 +439,14 @@ config PHY_STIH407_USB
  Enable this support to enable the picoPHY device used by USB2
  and USB3 controllers on STMicroelectronics STiH407 SoC families.
 
+config PHY_QCOM_QMP
+   tristate "Qualcomm QMP PHY Driver"
+   depends on OF && (ARCH_QCOM || COMPILE_TEST)
+ 

[PATCH V8 1/4] dt-bindings: phy: Add support for QUSB2 phy

2017-04-05 Thread Vivek Gautam
Qualcomm chipsets have QUSB2 phy controller that provides
HighSpeed functionality for DWC3 controller.
Adding dt binding information for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Stephen Boyd 
Acked-by: Rob Herring 
---

Changes since v7:
 - None, just added Stephen's Reviewed-by tag.

Changes since v6:
 - Dropped 'vdd-phy-supply' that used pm8994_s2 regulator, from bindings.
   As Stephen said, the pm8994_s2 is a 'corner' regulator and it shouldn't
   be right to put it as a regulator supply.
   Work is in progress to handle these sort of power supplies.

Changes since v5:
 - Removed leading 0 from the address in 'reg' property.

Changes since v4:
 - None.

Changes since v3:
 - Added 'Acked-by' from Rob.
 - Removed 'reset-names' and 'nvmem-cell-names' from the bindings
   since we use only one cell.

Changes since v2:
 - Removed binding for "ref_clk_src" since we don't request this
   clock in the driver.
 - Addressed s/vdda-phy-dpdm/vdda-phy-dpdm-supply.
 - Addressed s/ref_clk/ref. Don't need to add '_clk' suffix to clock names.
 - Addressed s/tune2_hstx_trim_efuse/tune2_hstx_trim. Don't need to add
   'efuse' suffix to nvmem cell.
 - Addressed s/qusb2phy/phy for the node name.

Changes since v1:
 - New patch, forked out of the original driver patch:
   "phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips"
 - Updated dt bindings to remove 'hstx-trim-bit-offset' and
   'hstx-trim-bit-len' bindings.

 .../devicetree/bindings/phy/qcom-qusb2-phy.txt | 43 ++
 1 file changed, 43 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt 
b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
new file mode 100644
index ..aa0fcb05acb3
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
@@ -0,0 +1,43 @@
+Qualcomm QUSB2 phy controller
+=
+
+QUSB2 controller supports LS/FS/HS usb connectivity on Qualcomm chipsets.
+
+Required properties:
+ - compatible: compatible list, contains "qcom,msm8996-qusb2-phy".
+ - reg: offset and length of the PHY register set.
+ - #phy-cells: must be 0.
+
+ - clocks: a list of phandles and clock-specifier pairs,
+  one for each entry in clock-names.
+ - clock-names: must be "cfg_ahb" for phy config clock,
+   "ref" for 19.2 MHz ref clk,
+   "iface" for phy interface clock (Optional).
+
+ - vdda-pll-supply: Phandle to 1.8V regulator supply to PHY refclk pll block.
+ - vdda-phy-dpdm-supply: Phandle to 3.1V regulator supply to Dp/Dm port 
signals.
+
+ - resets: Phandle to reset to phy block.
+
+Optional properties:
+ - nvmem-cells: Phandle to nvmem cell that contains 'HS Tx trim'
+   tuning parameter value for qusb2 phy.
+
+ - qcom,tcsr-syscon: Phandle to TCSR syscon register region.
+
+Example:
+   hsusb_phy: phy@7411000 {
+   compatible = "qcom,msm8996-qusb2-phy";
+   reg = <0x7411000 0x180>;
+   #phy-cells = <0>;
+
+   clocks = < GCC_USB_PHY_CFG_AHB2PHY_CLK>,
+   < GCC_RX1_USB2_CLKREF_CLK>,
+   clock-names = "cfg_ahb", "ref";
+
+   vdda-pll-supply = <_l12>;
+   vdda-phy-dpdm-supply = <_l24>;
+
+   resets = < GCC_QUSB2PHY_PRIM_BCR>;
+   nvmem-cells = <_hstx_trim>;
+};
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH V8 1/4] dt-bindings: phy: Add support for QUSB2 phy

2017-04-05 Thread Vivek Gautam
Qualcomm chipsets have QUSB2 phy controller that provides
HighSpeed functionality for DWC3 controller.
Adding dt binding information for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Stephen Boyd 
Acked-by: Rob Herring 
---

Changes since v7:
 - None, just added Stephen's Reviewed-by tag.

Changes since v6:
 - Dropped 'vdd-phy-supply' that used pm8994_s2 regulator, from bindings.
   As Stephen said, the pm8994_s2 is a 'corner' regulator and it shouldn't
   be right to put it as a regulator supply.
   Work is in progress to handle these sort of power supplies.

Changes since v5:
 - Removed leading 0 from the address in 'reg' property.

Changes since v4:
 - None.

Changes since v3:
 - Added 'Acked-by' from Rob.
 - Removed 'reset-names' and 'nvmem-cell-names' from the bindings
   since we use only one cell.

Changes since v2:
 - Removed binding for "ref_clk_src" since we don't request this
   clock in the driver.
 - Addressed s/vdda-phy-dpdm/vdda-phy-dpdm-supply.
 - Addressed s/ref_clk/ref. Don't need to add '_clk' suffix to clock names.
 - Addressed s/tune2_hstx_trim_efuse/tune2_hstx_trim. Don't need to add
   'efuse' suffix to nvmem cell.
 - Addressed s/qusb2phy/phy for the node name.

Changes since v1:
 - New patch, forked out of the original driver patch:
   "phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips"
 - Updated dt bindings to remove 'hstx-trim-bit-offset' and
   'hstx-trim-bit-len' bindings.

 .../devicetree/bindings/phy/qcom-qusb2-phy.txt | 43 ++
 1 file changed, 43 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt 
b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
new file mode 100644
index ..aa0fcb05acb3
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
@@ -0,0 +1,43 @@
+Qualcomm QUSB2 phy controller
+=
+
+QUSB2 controller supports LS/FS/HS usb connectivity on Qualcomm chipsets.
+
+Required properties:
+ - compatible: compatible list, contains "qcom,msm8996-qusb2-phy".
+ - reg: offset and length of the PHY register set.
+ - #phy-cells: must be 0.
+
+ - clocks: a list of phandles and clock-specifier pairs,
+  one for each entry in clock-names.
+ - clock-names: must be "cfg_ahb" for phy config clock,
+   "ref" for 19.2 MHz ref clk,
+   "iface" for phy interface clock (Optional).
+
+ - vdda-pll-supply: Phandle to 1.8V regulator supply to PHY refclk pll block.
+ - vdda-phy-dpdm-supply: Phandle to 3.1V regulator supply to Dp/Dm port 
signals.
+
+ - resets: Phandle to reset to phy block.
+
+Optional properties:
+ - nvmem-cells: Phandle to nvmem cell that contains 'HS Tx trim'
+   tuning parameter value for qusb2 phy.
+
+ - qcom,tcsr-syscon: Phandle to TCSR syscon register region.
+
+Example:
+   hsusb_phy: phy@7411000 {
+   compatible = "qcom,msm8996-qusb2-phy";
+   reg = <0x7411000 0x180>;
+   #phy-cells = <0>;
+
+   clocks = < GCC_USB_PHY_CFG_AHB2PHY_CLK>,
+   < GCC_RX1_USB2_CLKREF_CLK>,
+   clock-names = "cfg_ahb", "ref";
+
+   vdda-pll-supply = <_l12>;
+   vdda-phy-dpdm-supply = <_l24>;
+
+   resets = < GCC_QUSB2PHY_PRIM_BCR>;
+   nvmem-cells = <_hstx_trim>;
+};
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH V8 3/4] dt-bindings: phy: Add support for QMP phy

2017-04-05 Thread Vivek Gautam
Qualcomm chipsets have QMP phy controller that provides
support to a number of controller, viz. PCIe, UFS, and USB.
Adding dt binding information for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Stephen Boyd 
Acked-by: Rob Herring 
---

Changes since v7:
 - None, just added Stephen's Reviewed-by tag.

Changes since v6:
 - none.

Changes since v5:
 - Added Rob's 'Ack' for the new child nodes based bindings.
 - Dropped leading 0 from the address in 'reg' property.
 - Fixed '@xyz' part of the node name with correct address.

Changes since v4:
 - Added bindings for child nodes. Each phy lane is represented by child
   node with its own register space (for tx, rx and pcs blocks), and clocks
   and resets for power control facility.
 - Removed register space and lane offsets for tx, rx and pcs blocks from
   qmp phy node.
 - #phy-cells is now part of each child node and thus must be 0.
 - Added information on list of mandatory clocks and resets for each phy.

Changes since v3:
 - Added #clock-cells = <1>, indicating that phy is a clock provider.

Changes since v2:
 - Removed binding for "ref_clk_src" since we don't request this
   clock in the driver.
 - Addressed s/ref_clk/ref. Don't need to add '_clk' suffix to clock names.
 - Using 'phy' for the node name.

Changes since v1:
 - New patch, forked out of the original driver patch:
   "phy: qcom-qmp: new qmp phy driver for qcom-chipsets"
 - Added 'Acked-by' from Rob.
 - Updated bindings to include mem resource as a list of
   offset - length pair for serdes block and for each lane.
 - Added a new binding for 'lane-offsets' that contains offsets
   to tx, rx and pcs blocks from each lane base address.

 .../devicetree/bindings/phy/qcom-qmp-phy.txt   | 106 +
 1 file changed, 106 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt 
b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
new file mode 100644
index ..e11c563a65ec
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
@@ -0,0 +1,106 @@
+Qualcomm QMP PHY controller
+===
+
+QMP phy controller supports physical layer functionality for a number of
+controllers on Qualcomm chipsets, such as, PCIe, UFS, and USB.
+
+Required properties:
+ - compatible: compatible list, contains:
+  "qcom,msm8996-qmp-pcie-phy" for 14nm PCIe phy on msm8996,
+  "qcom,msm8996-qmp-usb3-phy" for 14nm USB3 phy on msm8996.
+
+ - reg: offset and length of register set for PHY's common serdes block.
+
+ - #clock-cells: must be 1
+- Phy pll outputs a bunch of clocks for Tx, Rx and Pipe
+  interface (for pipe based PHYs). These clock are then gate-controlled
+  by gcc.
+ - #address-cells: must be 1
+ - #size-cells: must be 1
+ - ranges: must be present
+
+ - clocks: a list of phandles and clock-specifier pairs,
+  one for each entry in clock-names.
+ - clock-names: "cfg_ahb" for phy config clock,
+   "aux" for phy aux clock,
+   "ref" for 19.2 MHz ref clk,
+   For "qcom,msm8996-qmp-pcie-phy" must contain:
+   "aux", "cfg_ahb", "ref".
+   For "qcom,msm8996-qmp-usb3-phy" must contain:
+   "aux", "cfg_ahb", "ref".
+
+ - resets: a list of phandles and reset controller specifier pairs,
+  one for each entry in reset-names.
+ - reset-names: "phy" for reset of phy block,
+   "common" for phy common block reset,
+   "cfg" for phy's ahb cfg block reset (Optional).
+   For "qcom,msm8996-qmp-pcie-phy" must contain:
+"phy", "common", "cfg".
+   For "qcom,msm8996-qmp-usb3-phy" must contain
+"phy", "common".
+
+ - vdda-phy-supply: Phandle to a regulator supply to PHY core block.
+ - vdda-pll-supply: Phandle to 1.8V regulator supply to PHY refclk pll block.
+
+Optional properties:
+ - vddp-ref-clk-supply: Phandle to a regulator supply to any specific refclk
+   pll block.
+
+Required nodes:
+ - Each device node of QMP phy is required to have as many child nodes as
+   the number of lanes the PHY has.
+
+Required properties for child node:
+ - reg: list of offset and length pairs of register sets for PHY blocks -
+   tx, rx and pcs.
+
+ - #phy-cells: must be 0
+
+ - clocks: a list of phandles and clock-specifier pairs,
+  one for each entry in clock-names.
+ - clock-names: Must contain following for pcie and usb qmp phys:
+"pipe" for pipe clock specific to each lane.
+
+ - resets: a list of phandles and reset controller specifier pairs,
+  one for each entry in reset-names.
+ - reset-names: Must contain following for pcie qmp phys:
+"lane" for reset specific to each lane.
+
+Example:
+   phy@34000 {
+   

[PATCH V8 3/4] dt-bindings: phy: Add support for QMP phy

2017-04-05 Thread Vivek Gautam
Qualcomm chipsets have QMP phy controller that provides
support to a number of controller, viz. PCIe, UFS, and USB.
Adding dt binding information for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Stephen Boyd 
Acked-by: Rob Herring 
---

Changes since v7:
 - None, just added Stephen's Reviewed-by tag.

Changes since v6:
 - none.

Changes since v5:
 - Added Rob's 'Ack' for the new child nodes based bindings.
 - Dropped leading 0 from the address in 'reg' property.
 - Fixed '@xyz' part of the node name with correct address.

Changes since v4:
 - Added bindings for child nodes. Each phy lane is represented by child
   node with its own register space (for tx, rx and pcs blocks), and clocks
   and resets for power control facility.
 - Removed register space and lane offsets for tx, rx and pcs blocks from
   qmp phy node.
 - #phy-cells is now part of each child node and thus must be 0.
 - Added information on list of mandatory clocks and resets for each phy.

Changes since v3:
 - Added #clock-cells = <1>, indicating that phy is a clock provider.

Changes since v2:
 - Removed binding for "ref_clk_src" since we don't request this
   clock in the driver.
 - Addressed s/ref_clk/ref. Don't need to add '_clk' suffix to clock names.
 - Using 'phy' for the node name.

Changes since v1:
 - New patch, forked out of the original driver patch:
   "phy: qcom-qmp: new qmp phy driver for qcom-chipsets"
 - Added 'Acked-by' from Rob.
 - Updated bindings to include mem resource as a list of
   offset - length pair for serdes block and for each lane.
 - Added a new binding for 'lane-offsets' that contains offsets
   to tx, rx and pcs blocks from each lane base address.

 .../devicetree/bindings/phy/qcom-qmp-phy.txt   | 106 +
 1 file changed, 106 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt

diff --git a/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt 
b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
new file mode 100644
index ..e11c563a65ec
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
@@ -0,0 +1,106 @@
+Qualcomm QMP PHY controller
+===
+
+QMP phy controller supports physical layer functionality for a number of
+controllers on Qualcomm chipsets, such as, PCIe, UFS, and USB.
+
+Required properties:
+ - compatible: compatible list, contains:
+  "qcom,msm8996-qmp-pcie-phy" for 14nm PCIe phy on msm8996,
+  "qcom,msm8996-qmp-usb3-phy" for 14nm USB3 phy on msm8996.
+
+ - reg: offset and length of register set for PHY's common serdes block.
+
+ - #clock-cells: must be 1
+- Phy pll outputs a bunch of clocks for Tx, Rx and Pipe
+  interface (for pipe based PHYs). These clock are then gate-controlled
+  by gcc.
+ - #address-cells: must be 1
+ - #size-cells: must be 1
+ - ranges: must be present
+
+ - clocks: a list of phandles and clock-specifier pairs,
+  one for each entry in clock-names.
+ - clock-names: "cfg_ahb" for phy config clock,
+   "aux" for phy aux clock,
+   "ref" for 19.2 MHz ref clk,
+   For "qcom,msm8996-qmp-pcie-phy" must contain:
+   "aux", "cfg_ahb", "ref".
+   For "qcom,msm8996-qmp-usb3-phy" must contain:
+   "aux", "cfg_ahb", "ref".
+
+ - resets: a list of phandles and reset controller specifier pairs,
+  one for each entry in reset-names.
+ - reset-names: "phy" for reset of phy block,
+   "common" for phy common block reset,
+   "cfg" for phy's ahb cfg block reset (Optional).
+   For "qcom,msm8996-qmp-pcie-phy" must contain:
+"phy", "common", "cfg".
+   For "qcom,msm8996-qmp-usb3-phy" must contain
+"phy", "common".
+
+ - vdda-phy-supply: Phandle to a regulator supply to PHY core block.
+ - vdda-pll-supply: Phandle to 1.8V regulator supply to PHY refclk pll block.
+
+Optional properties:
+ - vddp-ref-clk-supply: Phandle to a regulator supply to any specific refclk
+   pll block.
+
+Required nodes:
+ - Each device node of QMP phy is required to have as many child nodes as
+   the number of lanes the PHY has.
+
+Required properties for child node:
+ - reg: list of offset and length pairs of register sets for PHY blocks -
+   tx, rx and pcs.
+
+ - #phy-cells: must be 0
+
+ - clocks: a list of phandles and clock-specifier pairs,
+  one for each entry in clock-names.
+ - clock-names: Must contain following for pcie and usb qmp phys:
+"pipe" for pipe clock specific to each lane.
+
+ - resets: a list of phandles and reset controller specifier pairs,
+  one for each entry in reset-names.
+ - reset-names: Must contain following for pcie qmp phys:
+"lane" for reset specific to each lane.
+
+Example:
+   phy@34000 {
+   compatible = "qcom,msm8996-qmp-pcie-phy";
+ 

[PATCH V8 0/4] phy: USB and PCIe phy drivers for Qcom chipsets

2017-04-05 Thread Vivek Gautam
Hi Kishon,
Here's the series with fixed checkpatch warnings/checks.
Please pick it for phy/next.

This patch series adds couple of PHY drivers for Qualcomm chipsets.
a) qcom-qusb2 phy driver: that provides High Speed USB functionality.
b) qcom-qmp phy driver: that is a combo phy providing support for
   USB3, PCIe, UFS and few other controllers.

The patches are based on next branch of linux-phy tree.

These patches have been tested on Dragon board db820c hardware with
required set of dt patches.
The tested branch[3] is based on torvald's master with greg's usb/usb-next
merged. Additionally the patches to get rpm up on msm8996 are also pulled
in.

Changes since v7:
 - Fixed 'checkpatch --strict' alignment warnings/checks, and
   added Stephen's Reviewed-by tag.

Changes since v6:
 - Rebased on phy/next and *not* including phy grouping series[4].
 - qusb2-phy: addressed Stephen's comment.
   - Dropped pm8994_s2 corner regulator from QUSB2 phy bindings.
 - qmp-phy: none on functionality side.
 
Changes since v5:
 - Addressed review comments from Bjorn:
   - Removed instances of readl/wirtel_relaxed calls from the drivers.
 Instead, using simple readl/writel. Inserting a readl after a writel
 to ensure the write is through to the device.
   - Replaced regulator handling with regulator_bulk_** apis. This helps
 in cutting down a lot of regulator handling code.
   - Fixed minor return statements.

Changes since v4:
 - Addressed comment to add child nodes for qmp phy driver. Each phy lane
   now has a separate child node under the main qmp node.
 - Modified the clock and reset initialization and enable methods.
   Different phys - pcie, usb and later ufs, have varying number of clocks
   and resets that are mandatory. So adding provision for clocks and reset
   lists helps in requesting all mandatory resources for individual phys
   and handle their failure cases accordingly.

Changes since v3:
 - Addressed review comments given by Rob and Stephen for qusb2 phy
   and qmp phy bindings respectively.
 - Addressed review comments given by Stephen and Bjorn for qmp phy driver.

Changes since v2:
 - Addressed review comments given by Rob and Stephen for bindings.
 - Addressed the review comments given by Stephen for the qusb2 and qmp
   phy drivers.

Changes since v1:
 - Moved device tree binding documentation to separate patches, as suggested
   by Rob.
 - Addressed review comment regarding qfprom accesses by qusb2 phy driver,
   given by Rob.
 - Addressed review comments from Kishon.
 - Addressed review comments from Srinivas for QMP phy driver.
 - Addressed kbuild warning.

Please see individual patches for detailed changelogs.

[1] https://patchwork.kernel.org/patch/9567767/
[2] https://patchwork.kernel.org/patch/9567779/
[3] https://github.com/vivekgautam1/linux/tree/linux-v4.11-rc5-qmp-phy-db820c
[4] https://lkml.org/lkml/2017/3/20/407

Vivek Gautam (4):
  dt-bindings: phy: Add support for QUSB2 phy
  phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips
  dt-bindings: phy: Add support for QMP phy
  phy: qcom-qmp: new qmp phy driver for qcom-chipsets

 .../devicetree/bindings/phy/qcom-qmp-phy.txt   |  106 ++
 .../devicetree/bindings/phy/qcom-qusb2-phy.txt |   43 +
 drivers/phy/Kconfig|   18 +
 drivers/phy/Makefile   |2 +
 drivers/phy/phy-qcom-qmp.c | 1153 
 drivers/phy/phy-qcom-qusb2.c   |  493 +
 6 files changed, 1815 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
 create mode 100644 drivers/phy/phy-qcom-qmp.c
 create mode 100644 drivers/phy/phy-qcom-qusb2.c

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH V8 0/4] phy: USB and PCIe phy drivers for Qcom chipsets

2017-04-05 Thread Vivek Gautam
Hi Kishon,
Here's the series with fixed checkpatch warnings/checks.
Please pick it for phy/next.

This patch series adds couple of PHY drivers for Qualcomm chipsets.
a) qcom-qusb2 phy driver: that provides High Speed USB functionality.
b) qcom-qmp phy driver: that is a combo phy providing support for
   USB3, PCIe, UFS and few other controllers.

The patches are based on next branch of linux-phy tree.

These patches have been tested on Dragon board db820c hardware with
required set of dt patches.
The tested branch[3] is based on torvald's master with greg's usb/usb-next
merged. Additionally the patches to get rpm up on msm8996 are also pulled
in.

Changes since v7:
 - Fixed 'checkpatch --strict' alignment warnings/checks, and
   added Stephen's Reviewed-by tag.

Changes since v6:
 - Rebased on phy/next and *not* including phy grouping series[4].
 - qusb2-phy: addressed Stephen's comment.
   - Dropped pm8994_s2 corner regulator from QUSB2 phy bindings.
 - qmp-phy: none on functionality side.
 
Changes since v5:
 - Addressed review comments from Bjorn:
   - Removed instances of readl/wirtel_relaxed calls from the drivers.
 Instead, using simple readl/writel. Inserting a readl after a writel
 to ensure the write is through to the device.
   - Replaced regulator handling with regulator_bulk_** apis. This helps
 in cutting down a lot of regulator handling code.
   - Fixed minor return statements.

Changes since v4:
 - Addressed comment to add child nodes for qmp phy driver. Each phy lane
   now has a separate child node under the main qmp node.
 - Modified the clock and reset initialization and enable methods.
   Different phys - pcie, usb and later ufs, have varying number of clocks
   and resets that are mandatory. So adding provision for clocks and reset
   lists helps in requesting all mandatory resources for individual phys
   and handle their failure cases accordingly.

Changes since v3:
 - Addressed review comments given by Rob and Stephen for qusb2 phy
   and qmp phy bindings respectively.
 - Addressed review comments given by Stephen and Bjorn for qmp phy driver.

Changes since v2:
 - Addressed review comments given by Rob and Stephen for bindings.
 - Addressed the review comments given by Stephen for the qusb2 and qmp
   phy drivers.

Changes since v1:
 - Moved device tree binding documentation to separate patches, as suggested
   by Rob.
 - Addressed review comment regarding qfprom accesses by qusb2 phy driver,
   given by Rob.
 - Addressed review comments from Kishon.
 - Addressed review comments from Srinivas for QMP phy driver.
 - Addressed kbuild warning.

Please see individual patches for detailed changelogs.

[1] https://patchwork.kernel.org/patch/9567767/
[2] https://patchwork.kernel.org/patch/9567779/
[3] https://github.com/vivekgautam1/linux/tree/linux-v4.11-rc5-qmp-phy-db820c
[4] https://lkml.org/lkml/2017/3/20/407

Vivek Gautam (4):
  dt-bindings: phy: Add support for QUSB2 phy
  phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips
  dt-bindings: phy: Add support for QMP phy
  phy: qcom-qmp: new qmp phy driver for qcom-chipsets

 .../devicetree/bindings/phy/qcom-qmp-phy.txt   |  106 ++
 .../devicetree/bindings/phy/qcom-qusb2-phy.txt |   43 +
 drivers/phy/Kconfig|   18 +
 drivers/phy/Makefile   |2 +
 drivers/phy/phy-qcom-qmp.c | 1153 
 drivers/phy/phy-qcom-qusb2.c   |  493 +
 6 files changed, 1815 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qmp-phy.txt
 create mode 100644 Documentation/devicetree/bindings/phy/qcom-qusb2-phy.txt
 create mode 100644 drivers/phy/phy-qcom-qmp.c
 create mode 100644 drivers/phy/phy-qcom-qusb2.c

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [RFC 3/8] nvmet: Use p2pmem in nvme target

2017-04-05 Thread Sagi Grimberg



I hadn't done this yet but I think a simple closest device in the tree
would solve the issue sufficiently. However, I originally had it so the
user has to pick the device and I prefer that approach. But if the user
picks the device, then why bother restricting what he picks?


Because the user can get it wrong, and its our job to do what we can in
order to prevent the user from screwing itself.


Per the
thread with Sinan, I'd prefer to use what the user picks. You were one
of the biggest opponents to that so I'd like to hear your opinion on
removing the restrictions.


I wasn't against it that much, I'm all for making things "just work"
with minimal configuration steps, but I'm not sure we can get it
right without it.


Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
save an extra PCI transfer as the NVME card could just take the data
out of it's own memory. However, at this time, cards with CMB buffers
don't seem to be available.


Even if it was available, it would be hard to make real use of this
given that we wouldn't know how to pre-post recv buffers (for in-capsule
data). But let's leave this out of the scope entirely...


I don't understand what you're referring to. We'd simply use the CMB
buffer as a p2pmem device, why does that change anything?


I'm referring to the in-capsule data buffers pre-posts that we do.
Because we prepare a buffer that would contain in-capsule data, we have
no knowledge to which device the incoming I/O is directed to, which
means we can (and will) have I/O where the data lies in CMB of device
A but it's really targeted to device B - which sorta defeats the purpose
of what we're trying to optimize here...


Why do you need this? you have a reference to the
queue itself.


This keeps track of whether the response was actually allocated with
p2pmem or not. It's needed for when we free the SGL because the queue
may have a p2pmem device assigned to it but, if the alloc failed and it
fell back on system memory then we need to know how to free it. I'm
currently looking at having SGLs having an iomem flag. In which case,
this would no longer be needed as the flag in the SGL could be used.


That would be better, maybe...

[...]


This is a problem. namespaces can be added at any point in time. No one
guarantee that dma_devs are all the namepaces we'll ever see.


Yeah, well restricting p2pmem based on all the devices in use is hard.
So we'd need a call into the transport every time an ns is added and
we'd have to drop the p2pmem if they add one that isn't supported. This
complexity is just one of the reasons I prefer just letting the user chose.


Still the user can get it wrong. Not sure we can get a way without
keeping track of this as new devices join the subsystem.


+
+if (queue->p2pmem)
+pr_debug("using %s for rdma nvme target queue",
+ dev_name(>p2pmem->dev));
+
+kfree(dma_devs);
+}
+
 static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 struct rdma_cm_event *event)
 {
@@ -1199,6 +1271,8 @@ static int nvmet_rdma_queue_connect(struct
rdma_cm_id *cm_id,
 }
 queue->port = cm_id->context;

+nvmet_rdma_queue_setup_p2pmem(queue);
+


Why is all this done for each queue? looks completely redundant to me.


A little bit. Where would you put it?


I think we'll need a representation of a controller in nvmet-rdma for
that. we sort of got a way without it so far, but I don't think we can
anymore with this.


 ret = nvmet_rdma_cm_accept(cm_id, queue, >param.conn);
 if (ret)
 goto release_queue;


You seemed to skip the in-capsule buffers for p2pmem (inline_page), I'm
curious why?


Yes, the thinking was that these transfers were small anyway so there
would not be significant benefit to pushing them through p2pmem. There's
really no reason why we couldn't do that if it made sense to though.


I don't see an urgent reason for it too. I was just curious...


Re: [RFC 3/8] nvmet: Use p2pmem in nvme target

2017-04-05 Thread Sagi Grimberg



I hadn't done this yet but I think a simple closest device in the tree
would solve the issue sufficiently. However, I originally had it so the
user has to pick the device and I prefer that approach. But if the user
picks the device, then why bother restricting what he picks?


Because the user can get it wrong, and its our job to do what we can in
order to prevent the user from screwing itself.


Per the
thread with Sinan, I'd prefer to use what the user picks. You were one
of the biggest opponents to that so I'd like to hear your opinion on
removing the restrictions.


I wasn't against it that much, I'm all for making things "just work"
with minimal configuration steps, but I'm not sure we can get it
right without it.


Ideally, we'd want to use an NVME CMB buffer as p2p memory. This would
save an extra PCI transfer as the NVME card could just take the data
out of it's own memory. However, at this time, cards with CMB buffers
don't seem to be available.


Even if it was available, it would be hard to make real use of this
given that we wouldn't know how to pre-post recv buffers (for in-capsule
data). But let's leave this out of the scope entirely...


I don't understand what you're referring to. We'd simply use the CMB
buffer as a p2pmem device, why does that change anything?


I'm referring to the in-capsule data buffers pre-posts that we do.
Because we prepare a buffer that would contain in-capsule data, we have
no knowledge to which device the incoming I/O is directed to, which
means we can (and will) have I/O where the data lies in CMB of device
A but it's really targeted to device B - which sorta defeats the purpose
of what we're trying to optimize here...


Why do you need this? you have a reference to the
queue itself.


This keeps track of whether the response was actually allocated with
p2pmem or not. It's needed for when we free the SGL because the queue
may have a p2pmem device assigned to it but, if the alloc failed and it
fell back on system memory then we need to know how to free it. I'm
currently looking at having SGLs having an iomem flag. In which case,
this would no longer be needed as the flag in the SGL could be used.


That would be better, maybe...

[...]


This is a problem. namespaces can be added at any point in time. No one
guarantee that dma_devs are all the namepaces we'll ever see.


Yeah, well restricting p2pmem based on all the devices in use is hard.
So we'd need a call into the transport every time an ns is added and
we'd have to drop the p2pmem if they add one that isn't supported. This
complexity is just one of the reasons I prefer just letting the user chose.


Still the user can get it wrong. Not sure we can get a way without
keeping track of this as new devices join the subsystem.


+
+if (queue->p2pmem)
+pr_debug("using %s for rdma nvme target queue",
+ dev_name(>p2pmem->dev));
+
+kfree(dma_devs);
+}
+
 static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 struct rdma_cm_event *event)
 {
@@ -1199,6 +1271,8 @@ static int nvmet_rdma_queue_connect(struct
rdma_cm_id *cm_id,
 }
 queue->port = cm_id->context;

+nvmet_rdma_queue_setup_p2pmem(queue);
+


Why is all this done for each queue? looks completely redundant to me.


A little bit. Where would you put it?


I think we'll need a representation of a controller in nvmet-rdma for
that. we sort of got a way without it so far, but I don't think we can
anymore with this.


 ret = nvmet_rdma_cm_accept(cm_id, queue, >param.conn);
 if (ret)
 goto release_queue;


You seemed to skip the in-capsule buffers for p2pmem (inline_page), I'm
curious why?


Yes, the thinking was that these transfers were small anyway so there
would not be significant benefit to pushing them through p2pmem. There's
really no reason why we couldn't do that if it made sense to though.


I don't see an urgent reason for it too. I was just curious...


Re: [PATCH v2 4/9] arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages

2017-04-05 Thread kbuild test robot
Hi Punit,

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.11-rc5 next-20170405]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Punit-Agrawal/Support-swap-entries-for-contiguous-pte-hugepages/20170406-090327
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
for-next/core
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/hugetlbpage.c: In function 'huge_pte_clear':
>> arch/arm64/mm/hugetlbpage.c:200:44: error: incompatible type for argument 4 
>> of 'find_num_contig'
 ncontig = find_num_contig(mm, addr, ptep, );
   ^
   arch/arm64/mm/hugetlbpage.c:44:12: note: expected 'pte_t {aka struct 
}' but argument is of type 'size_t * {aka long unsigned int *}'
static int find_num_contig(struct mm_struct *mm, unsigned long addr,
   ^~~
>> arch/arm64/mm/hugetlbpage.c:200:12: error: too few arguments to function 
>> 'find_num_contig'
 ncontig = find_num_contig(mm, addr, ptep, );
   ^~~
   arch/arm64/mm/hugetlbpage.c:44:12: note: declared here
static int find_num_contig(struct mm_struct *mm, unsigned long addr,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_get_and_clear':
   arch/arm64/mm/hugetlbpage.c:216:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_access_flags':
   arch/arm64/mm/hugetlbpage.c:254:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(vma->vm_mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_wrprotect':
   arch/arm64/mm/hugetlbpage.c:279:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_clear_flush':
   arch/arm64/mm/hugetlbpage.c:296:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(vma->vm_mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~

vim +/find_num_contig +200 arch/arm64/mm/hugetlbpage.c

   194  
   195  if (sz == PUD_SIZE || sz == PMD_SIZE) {
   196  pte_clear(mm, addr, ptep);
   197  return;
   198  }
   199  
 > 200  ncontig = find_num_contig(mm, addr, ptep, );
   201  for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
   202  pte_clear(mm, addr, ptep);
   203  }

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v2 4/9] arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages

2017-04-05 Thread kbuild test robot
Hi Punit,

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on v4.11-rc5 next-20170405]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Punit-Agrawal/Support-swap-entries-for-contiguous-pte-hugepages/20170406-090327
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
for-next/core
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All errors (new ones prefixed by >>):

   arch/arm64/mm/hugetlbpage.c: In function 'huge_pte_clear':
>> arch/arm64/mm/hugetlbpage.c:200:44: error: incompatible type for argument 4 
>> of 'find_num_contig'
 ncontig = find_num_contig(mm, addr, ptep, );
   ^
   arch/arm64/mm/hugetlbpage.c:44:12: note: expected 'pte_t {aka struct 
}' but argument is of type 'size_t * {aka long unsigned int *}'
static int find_num_contig(struct mm_struct *mm, unsigned long addr,
   ^~~
>> arch/arm64/mm/hugetlbpage.c:200:12: error: too few arguments to function 
>> 'find_num_contig'
 ncontig = find_num_contig(mm, addr, ptep, );
   ^~~
   arch/arm64/mm/hugetlbpage.c:44:12: note: declared here
static int find_num_contig(struct mm_struct *mm, unsigned long addr,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_get_and_clear':
   arch/arm64/mm/hugetlbpage.c:216:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_access_flags':
   arch/arm64/mm/hugetlbpage.c:254:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(vma->vm_mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_set_wrprotect':
   arch/arm64/mm/hugetlbpage.c:279:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~
   arch/arm64/mm/hugetlbpage.c: In function 'huge_ptep_clear_flush':
   arch/arm64/mm/hugetlbpage.c:296:10: error: too few arguments to function 
'huge_pte_offset'
  cpte = huge_pte_offset(vma->vm_mm, addr);
 ^~~
   arch/arm64/mm/hugetlbpage.c:135:8: note: declared here
pte_t *huge_pte_offset(struct mm_struct *mm,
   ^~~

vim +/find_num_contig +200 arch/arm64/mm/hugetlbpage.c

   194  
   195  if (sz == PUD_SIZE || sz == PMD_SIZE) {
   196  pte_clear(mm, addr, ptep);
   197  return;
   198  }
   199  
 > 200  ncontig = find_num_contig(mm, addr, ptep, );
   201  for (i = 0; i < ncontig; i++, addr += pgsize, ptep++)
   202  pte_clear(mm, addr, ptep);
   203  }

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


RE: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread Long Li


> -Original Message-
> From: KY Srinivasan
> Sent: Wednesday, April 5, 2017 9:21 PM
> To: Bart Van Assche ; linux-
> ker...@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> ; ax...@kernel.dk
> Cc: Stephen Hemminger 
> Subject: RE: [PATCH] block-mq: set both block queue and hardware queue
> restart bit for restart
> 
> 
> 
> > -Original Message-
> > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > Sent: Wednesday, April 5, 2017 8:46 PM
> > To: linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> > ; ax...@kernel.dk
> > Cc: Stephen Hemminger ; KY Srinivasan
> > 
> > Subject: Re: [PATCH] block-mq: set both block queue and hardware queue
> > restart bit for restart
> >
> > On Thu, 2017-04-06 at 03:38 +, Long Li wrote:
> > > > -Original Message-
> > > > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > > >
> > > > Please drop this patch. I'm working on a better solution.
> > >
> > > Thank you. Looking forward to your patch.
> >
> > Hello Long,
> >
> > It would help if you could share the name of the block or SCSI driver
> > with which you ran into that lockup and also if you could share the
> > name of the I/O scheduler used in your test.
> 
> The tests that indicated the issue were run Hyper-V. The driver is
> storvsc_drv.c The I/O scheduler was I think noop.

Yes, we see I/O hung on scheduler none. Also tried on mq-deadline, same hung 
with the same cause.

> 
> K. Y
> >
> > Thanks,
> >
> > Bart.


RE: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread Long Li


> -Original Message-
> From: KY Srinivasan
> Sent: Wednesday, April 5, 2017 9:21 PM
> To: Bart Van Assche ; linux-
> ker...@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> ; ax...@kernel.dk
> Cc: Stephen Hemminger 
> Subject: RE: [PATCH] block-mq: set both block queue and hardware queue
> restart bit for restart
> 
> 
> 
> > -Original Message-
> > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > Sent: Wednesday, April 5, 2017 8:46 PM
> > To: linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> > ; ax...@kernel.dk
> > Cc: Stephen Hemminger ; KY Srinivasan
> > 
> > Subject: Re: [PATCH] block-mq: set both block queue and hardware queue
> > restart bit for restart
> >
> > On Thu, 2017-04-06 at 03:38 +, Long Li wrote:
> > > > -Original Message-
> > > > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > > >
> > > > Please drop this patch. I'm working on a better solution.
> > >
> > > Thank you. Looking forward to your patch.
> >
> > Hello Long,
> >
> > It would help if you could share the name of the block or SCSI driver
> > with which you ran into that lockup and also if you could share the
> > name of the I/O scheduler used in your test.
> 
> The tests that indicated the issue were run Hyper-V. The driver is
> storvsc_drv.c The I/O scheduler was I think noop.

Yes, we see I/O hung on scheduler none. Also tried on mq-deadline, same hung 
with the same cause.

> 
> K. Y
> >
> > Thanks,
> >
> > Bart.


linux-next: Tree for Apr 6

2017-04-05 Thread Stephen Rothwell
Hi all,

Changes since 20170405:

The input tree gained a conflict against the jc_docs tree.

The mfd tree still had its build failure for which I reverted a commit.

The tip tree lost its build failure but gained a conflict against the
pstore tree.

The staging tree gained conflicts against the v4l-dvb tree.

Non-merge commits (relative to Linus' tree): 7375
 7511 files changed, 904555 insertions(+), 147017 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 256 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (aeb4a5768179 Merge tag 'mfd-fixes-4.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd)
Merging fixes/master (97da3854c526 Linux 4.11-rc3)
Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading 
parentheses around a condition)
Merging arc-current/for-curr (a71c9a1c779f Linux 4.11-rc5)
Merging arm-current/fixes (35512d971274 Merge branch 'kprobe-fixes' of 
https://git.linaro.org/people/tixy/kernel into fixes)
Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7ed23e1bae8b powerpc: Disable HFSCR[TM] if TM is 
not supported)
Merging sparc/master (0ae2d26ffe70 arch/sparc: Avoid DCTI Couples)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (3ebfdf082184 sctp: get sock from transport in 
sctp_transport_update_pmtu)
Merging ipsec/master (89e357d83c06 af_key: Add lock to key dump)
Merging netfilter/master (0b9aefea8600 tcp: minimize false-positives on TCP/GRO 
check)
Merging ipvs/master (0b9aefea8600 tcp: minimize false-positives on TCP/GRO 
check)
Merging wireless-drivers/master (d77facb88448 brcmfmac: use local iftype 
avoiding use-after-free of virtual interface)
Merging mac80211/master (75514b665485 net: ethernet: ti: cpsw: wake tx queues 
on ndo_tx_timeout)
Merging sound-current/for-linus (2f726aec19a9 ALSA: hda - fix a problem for 
lineout on a Dell AIO machine)
Merging pci-current/for-linus (794a8604fe6e PCI: dwc: Fix dw_pcie_ops NULL 
pointer dereference)
Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging tty.current/tty-linus (a71c9a1c779f Linux 4.11-rc5)
Merging usb.current/usb-linus (a71c9a1c779f Linux 4.11-rc5)
Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't 
access hidg->req without spinlock held)
Merging usb-serial-fixes/usb-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON)
Merging staging.current/staging-linus (a6d361404d81 Merge tag 
'iio-fixes-for-4.11d' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus)
Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging input-current/for-linus (5659495a7a14 uapi: add missing install of 
userio.h)
Merging crypto-current/master (40c98cb57cdb crypto: caam - fix RNG 
deinstantiation error checking)
Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to 
palm_bk3710_port_info)
Merging vfio-fixes/for-linus (65b1adebfe43 vfio: Rework group release notifier 
warning)
Mergin

linux-next: Tree for Apr 6

2017-04-05 Thread Stephen Rothwell
Hi all,

Changes since 20170405:

The input tree gained a conflict against the jc_docs tree.

The mfd tree still had its build failure for which I reverted a commit.

The tip tree lost its build failure but gained a conflict against the
pstore tree.

The staging tree gained conflicts against the v4l-dvb tree.

Non-merge commits (relative to Linus' tree): 7375
 7511 files changed, 904555 insertions(+), 147017 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 256 trees (counting Linus' and 37 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (aeb4a5768179 Merge tag 'mfd-fixes-4.11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd)
Merging fixes/master (97da3854c526 Linux 4.11-rc3)
Merging kbuild-current/fixes (9be3213b14d4 gconfig: remove misleading 
parentheses around a condition)
Merging arc-current/for-curr (a71c9a1c779f Linux 4.11-rc5)
Merging arm-current/fixes (35512d971274 Merge branch 'kprobe-fixes' of 
https://git.linaro.org/people/tixy/kernel into fixes)
Merging m68k-current/for-linus (e3b1ebd67387 m68k: Wire up statx)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (7ed23e1bae8b powerpc: Disable HFSCR[TM] if TM is 
not supported)
Merging sparc/master (0ae2d26ffe70 arch/sparc: Avoid DCTI Couples)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (3ebfdf082184 sctp: get sock from transport in 
sctp_transport_update_pmtu)
Merging ipsec/master (89e357d83c06 af_key: Add lock to key dump)
Merging netfilter/master (0b9aefea8600 tcp: minimize false-positives on TCP/GRO 
check)
Merging ipvs/master (0b9aefea8600 tcp: minimize false-positives on TCP/GRO 
check)
Merging wireless-drivers/master (d77facb88448 brcmfmac: use local iftype 
avoiding use-after-free of virtual interface)
Merging mac80211/master (75514b665485 net: ethernet: ti: cpsw: wake tx queues 
on ndo_tx_timeout)
Merging sound-current/for-linus (2f726aec19a9 ALSA: hda - fix a problem for 
lineout on a Dell AIO machine)
Merging pci-current/for-linus (794a8604fe6e PCI: dwc: Fix dw_pcie_ops NULL 
pointer dereference)
Merging driver-core.current/driver-core-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging tty.current/tty-linus (a71c9a1c779f Linux 4.11-rc5)
Merging usb.current/usb-linus (a71c9a1c779f Linux 4.11-rc5)
Merging usb-gadget-fixes/fixes (25cd9721c2b1 usb: gadget: f_hid: fix: Don't 
access hidg->req without spinlock held)
Merging usb-serial-fixes/usb-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging usb-chipidea-fixes/ci-for-usb-stable (c7fbb09b2ea1 usb: chipidea: move 
the lock initialization to core file)
Merging phy/fixes (1a09b6a7c10e phy: qcom-usb-hs: Add depends on EXTCON)
Merging staging.current/staging-linus (a6d361404d81 Merge tag 
'iio-fixes-for-4.11d' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus)
Merging char-misc.current/char-misc-linus (c02ed2e75ef4 Linux 4.11-rc4)
Merging input-current/for-linus (5659495a7a14 uapi: add missing install of 
userio.h)
Merging crypto-current/master (40c98cb57cdb crypto: caam - fix RNG 
deinstantiation error checking)
Merging ide/master (96297aee8bce ide: palm_bk3710: add __initdata to 
palm_bk3710_port_info)
Merging vfio-fixes/for-linus (65b1adebfe43 vfio: Rework group release notifier 
warning)
Mergin

[PATCH -mm -v8 3/3] mm, THP, swap: Enable THP swap optimization only if has compound map

2017-04-05 Thread Huang, Ying
From: Huang Ying 

If there is no compound map for a THP (Transparent Huge Page), it is
possible that the map count of some sub-pages of the THP is 0.  So it
is better to split the THP before swapping out. In this way, the
sub-pages not mapped will be freed, and we can avoid the unnecessary
swap out operations for these sub-pages.

Cc: Johannes Weiner 
Signed-off-by: "Huang, Ying" 
---
 mm/swap_state.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 612fb2418df6..528af29327c9 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -205,7 +205,9 @@ int add_to_swap(struct page *page, struct list_head *list)
/* cannot split, skip it */
if (!can_split_huge_page(page, NULL))
return 0;
-   huge = true;
+   /* fallback to split huge page firstly if no PMD map */
+   if (compound_mapcount(page))
+   huge = true;
}
 #endif
 
-- 
2.11.0



[PATCH -mm -v8 3/3] mm, THP, swap: Enable THP swap optimization only if has compound map

2017-04-05 Thread Huang, Ying
From: Huang Ying 

If there is no compound map for a THP (Transparent Huge Page), it is
possible that the map count of some sub-pages of the THP is 0.  So it
is better to split the THP before swapping out. In this way, the
sub-pages not mapped will be freed, and we can avoid the unnecessary
swap out operations for these sub-pages.

Cc: Johannes Weiner 
Signed-off-by: "Huang, Ying" 
---
 mm/swap_state.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 612fb2418df6..528af29327c9 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -205,7 +205,9 @@ int add_to_swap(struct page *page, struct list_head *list)
/* cannot split, skip it */
if (!can_split_huge_page(page, NULL))
return 0;
-   huge = true;
+   /* fallback to split huge page firstly if no PMD map */
+   if (compound_mapcount(page))
+   huge = true;
}
 #endif
 
-- 
2.11.0



[PATCH -mm -v8 2/3] mm, THP, swap: Check whether THP can be split firstly

2017-04-05 Thread Huang, Ying
From: Huang Ying 

In the original THP swapping out implementation, before splitting the
THP (Transparent Huage Page), the swap cluster will be allocated and
the THP will be added into the swap cache.  But it is possible that
the THP cannot be split, and we must delete the THP from the swap
cache and free the swap cluster.  To avoid that, in this patch,
whether the THP can be split is checked firstly.  The check can only
be done racy, but it is good enough for most cases.

Cc: Johannes Weiner 
Signed-off-by: "Huang, Ying" 
Acked-by: Kirill A. Shutemov  [for 
can_split_huge_page()]
---
 include/linux/huge_mm.h |  7 +++
 mm/huge_memory.c| 20 
 mm/swap_state.c |  7 ++-
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a3762d49ba39..d3b3e8fcc717 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -113,6 +113,7 @@ extern unsigned long thp_get_unmapped_area(struct file 
*filp,
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
+bool can_split_huge_page(struct page *page, int *pextra_pins);
 int split_huge_page_to_list(struct page *page, struct list_head *list);
 static inline int split_huge_page(struct page *page)
 {
@@ -231,6 +232,12 @@ static inline void prep_transhuge_page(struct page *page) 
{}
 
 #define thp_get_unmapped_area  NULL
 
+static inline bool
+can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   BUILD_BUG();
+   return false;
+}
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4a5c1ca21894..459c7d5cdeb3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2372,6 +2372,21 @@ int page_trans_huge_mapcount(struct page *page, int 
*total_mapcount)
return ret;
 }
 
+/* Racy check whether the huge page can be split */
+bool can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   int extra_pins;
+
+   /* Additional pins from radix tree */
+   if (PageAnon(page))
+   extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
+   else
+   extra_pins = HPAGE_PMD_NR;
+   if (pextra_pins)
+   *pextra_pins = extra_pins;
+   return total_mapcount(page) == page_count(page) - extra_pins - 1;
+}
+
 /*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
@@ -2419,7 +2434,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
ret = -EBUSY;
goto out;
}
-   extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
mapping = NULL;
anon_vma_lock_write(anon_vma);
} else {
@@ -2431,8 +2445,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
goto out;
}
 
-   /* Addidional pins from radix tree */
-   extra_pins = HPAGE_PMD_NR;
anon_vma = NULL;
i_mmap_lock_read(mapping);
}
@@ -2441,7 +2453,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
 * Racy check if we can split the page, before freeze_page() will
 * split PMDs
 */
-   if (total_mapcount(head) != page_count(head) - extra_pins - 1) {
+   if (!can_split_huge_page(head, _pins)) {
ret = -EBUSY;
goto out_unlock;
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 7659557351cf..612fb2418df6 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -201,7 +201,12 @@ int add_to_swap(struct page *page, struct list_head *list)
VM_BUG_ON_PAGE(!PageUptodate(page), page);
 
 #ifdef CONFIG_THP_SWAP_CLUSTER
-   huge = PageTransHuge(page);
+   if (unlikely(PageTransHuge(page))) {
+   /* cannot split, skip it */
+   if (!can_split_huge_page(page, NULL))
+   return 0;
+   huge = true;
+   }
 #endif
 
 retry:
-- 
2.11.0



[PATCH -mm -v8 1/3] mm, THP, swap: Delay splitting THP during swap out

2017-04-05 Thread Huang, Ying
From: Huang Ying 

In this patch, splitting huge page is delayed from almost the first
step of swapping out to after allocating the swap space for the
THP (Transparent Huge Page) and adding the THP into the swap cache.
This will batch the corresponding operation, thus improve THP swap out
throughput.

This is the first step for the THP swap optimization.  The plan is to
delay splitting the THP step by step and avoid splitting the THP
finally.

The advantages of the THP swap support include:

- Batch the swap operations for the THP and reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help to improve the THP swap performance.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which usually are 4k random
  IO.  This will help to improve the THP swap performance.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after the THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

In this patch, one swap cluster is used to hold the contents of each
THP swapped out.  So, the size of the swap cluster is changed to that
of the THP (Transparent Huge Page) on x86_64 architecture (512).  For
other architectures which want such THP swap optimization,
ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file
for the architecture.  In effect, this will enlarge swap cluster size
by 2 times on x86_64.  Which may make it harder to find a free cluster
when the swap space becomes fragmented.  So that, this may reduce the
continuous swap space allocation and sequential write in theory.  The
performance test in 0day shows no regressions caused by this.

In the future of THP swap optimization, some information of the
swapped out THP (such as compound map count) will be recorded in the
swap_cluster_info data structure.

The mem cgroup swap accounting functions are enhanced to support
charge or uncharge a swap cluster backing a THP as a whole.

The swap cluster allocate/free functions are added to allocate/free a
swap cluster for a THP.  A fair simple algorithm is used for swap
cluster allocation, that is, only the first swap device in priority
list will be tried to allocate the swap cluster.  The function will
fail if the trying is not successful, and the caller will fallback to
allocate a single swap slot instead.  This works good enough for
normal cases.  If the difference of the number of the free swap
clusters among multiple swap devices is significant, it is possible
that some THPs are split earlier than necessary.  For example, this
could be caused by big size difference among multiple swap devices.

The swap cache functions is enhanced to support add/delete THP to/from
the swap cache as a set of (HPAGE_PMD_NR) sub-pages.  This may be
enhanced in the future with multi-order radix tree.  But because we
will split the THP soon during swapping out, that optimization doesn't
make much sense for this first step.

The THP splitting functions are enhanced to support to split THP in
swap cache during swapping out.  The page lock will be held during
allocating the swap cluster, adding the THP into the swap cache and
splitting the THP.  So in the code path other than swapping out, if
the THP need to be split, the PageSwapCache(THP) will be always false.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 --
 %stddev %change %stddev
 \  |\
   7043990 ±  0% +21.2%8536807 

[PATCH -mm -v8 2/3] mm, THP, swap: Check whether THP can be split firstly

2017-04-05 Thread Huang, Ying
From: Huang Ying 

In the original THP swapping out implementation, before splitting the
THP (Transparent Huage Page), the swap cluster will be allocated and
the THP will be added into the swap cache.  But it is possible that
the THP cannot be split, and we must delete the THP from the swap
cache and free the swap cluster.  To avoid that, in this patch,
whether the THP can be split is checked firstly.  The check can only
be done racy, but it is good enough for most cases.

Cc: Johannes Weiner 
Signed-off-by: "Huang, Ying" 
Acked-by: Kirill A. Shutemov  [for 
can_split_huge_page()]
---
 include/linux/huge_mm.h |  7 +++
 mm/huge_memory.c| 20 
 mm/swap_state.c |  7 ++-
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index a3762d49ba39..d3b3e8fcc717 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -113,6 +113,7 @@ extern unsigned long thp_get_unmapped_area(struct file 
*filp,
 extern void prep_transhuge_page(struct page *page);
 extern void free_transhuge_page(struct page *page);
 
+bool can_split_huge_page(struct page *page, int *pextra_pins);
 int split_huge_page_to_list(struct page *page, struct list_head *list);
 static inline int split_huge_page(struct page *page)
 {
@@ -231,6 +232,12 @@ static inline void prep_transhuge_page(struct page *page) 
{}
 
 #define thp_get_unmapped_area  NULL
 
+static inline bool
+can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   BUILD_BUG();
+   return false;
+}
 static inline int
 split_huge_page_to_list(struct page *page, struct list_head *list)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4a5c1ca21894..459c7d5cdeb3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2372,6 +2372,21 @@ int page_trans_huge_mapcount(struct page *page, int 
*total_mapcount)
return ret;
 }
 
+/* Racy check whether the huge page can be split */
+bool can_split_huge_page(struct page *page, int *pextra_pins)
+{
+   int extra_pins;
+
+   /* Additional pins from radix tree */
+   if (PageAnon(page))
+   extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
+   else
+   extra_pins = HPAGE_PMD_NR;
+   if (pextra_pins)
+   *pextra_pins = extra_pins;
+   return total_mapcount(page) == page_count(page) - extra_pins - 1;
+}
+
 /*
  * This function splits huge page into normal pages. @page can point to any
  * subpage of huge page to split. Split doesn't change the position of @page.
@@ -2419,7 +2434,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
ret = -EBUSY;
goto out;
}
-   extra_pins = PageSwapCache(page) ? HPAGE_PMD_NR : 0;
mapping = NULL;
anon_vma_lock_write(anon_vma);
} else {
@@ -2431,8 +2445,6 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
goto out;
}
 
-   /* Addidional pins from radix tree */
-   extra_pins = HPAGE_PMD_NR;
anon_vma = NULL;
i_mmap_lock_read(mapping);
}
@@ -2441,7 +2453,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
 * Racy check if we can split the page, before freeze_page() will
 * split PMDs
 */
-   if (total_mapcount(head) != page_count(head) - extra_pins - 1) {
+   if (!can_split_huge_page(head, _pins)) {
ret = -EBUSY;
goto out_unlock;
}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 7659557351cf..612fb2418df6 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -201,7 +201,12 @@ int add_to_swap(struct page *page, struct list_head *list)
VM_BUG_ON_PAGE(!PageUptodate(page), page);
 
 #ifdef CONFIG_THP_SWAP_CLUSTER
-   huge = PageTransHuge(page);
+   if (unlikely(PageTransHuge(page))) {
+   /* cannot split, skip it */
+   if (!can_split_huge_page(page, NULL))
+   return 0;
+   huge = true;
+   }
 #endif
 
 retry:
-- 
2.11.0



[PATCH -mm -v8 1/3] mm, THP, swap: Delay splitting THP during swap out

2017-04-05 Thread Huang, Ying
From: Huang Ying 

In this patch, splitting huge page is delayed from almost the first
step of swapping out to after allocating the swap space for the
THP (Transparent Huge Page) and adding the THP into the swap cache.
This will batch the corresponding operation, thus improve THP swap out
throughput.

This is the first step for the THP swap optimization.  The plan is to
delay splitting the THP step by step and avoid splitting the THP
finally.

The advantages of the THP swap support include:

- Batch the swap operations for the THP and reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help to improve the THP swap performance.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which usually are 4k random
  IO.  This will help to improve the THP swap performance.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after the THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

In this patch, one swap cluster is used to hold the contents of each
THP swapped out.  So, the size of the swap cluster is changed to that
of the THP (Transparent Huge Page) on x86_64 architecture (512).  For
other architectures which want such THP swap optimization,
ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file
for the architecture.  In effect, this will enlarge swap cluster size
by 2 times on x86_64.  Which may make it harder to find a free cluster
when the swap space becomes fragmented.  So that, this may reduce the
continuous swap space allocation and sequential write in theory.  The
performance test in 0day shows no regressions caused by this.

In the future of THP swap optimization, some information of the
swapped out THP (such as compound map count) will be recorded in the
swap_cluster_info data structure.

The mem cgroup swap accounting functions are enhanced to support
charge or uncharge a swap cluster backing a THP as a whole.

The swap cluster allocate/free functions are added to allocate/free a
swap cluster for a THP.  A fair simple algorithm is used for swap
cluster allocation, that is, only the first swap device in priority
list will be tried to allocate the swap cluster.  The function will
fail if the trying is not successful, and the caller will fallback to
allocate a single swap slot instead.  This works good enough for
normal cases.  If the difference of the number of the free swap
clusters among multiple swap devices is significant, it is possible
that some THPs are split earlier than necessary.  For example, this
could be caused by big size difference among multiple swap devices.

The swap cache functions is enhanced to support add/delete THP to/from
the swap cache as a set of (HPAGE_PMD_NR) sub-pages.  This may be
enhanced in the future with multi-order radix tree.  But because we
will split the THP soon during swapping out, that optimization doesn't
make much sense for this first step.

The THP splitting functions are enhanced to support to split THP in
swap cache during swapping out.  The page lock will be held during
allocating the swap cluster, adding the THP into the swap cache and
splitting the THP.  So in the code path other than swapping out, if
the THP need to be split, the PageSwapCache(THP) will be always false.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 --
 %stddev %change %stddev
 \  |\
   7043990 ±  0% +21.2%8536807 ±  0%  

[PATCH -mm -v8 0/3] THP swap: Delay splitting THP during swapping out

2017-04-05 Thread Huang, Ying
From: Huang Ying 

This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.

Hi, Andrew, could you help me to check whether the overall design is
reasonable?

Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset?

Hi, Andrea could you help me to review the THP part of the patchset?

Hi, Johannes, Michal, I am not very confident about the memory cgroup
part.  Could you help me to review it?

And for all, Any comment is welcome!


Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swap out even on a high-end server machine.  Because the
performance of the storage device improved faster than that of single
logical CPU.  And it seems that the trend will not change in the near
future.  On the other hand, the THP becomes more and more popular
because of increased memory size.  So it becomes necessary to optimize
THP swap performance.

The advantages of the THP swap support include:

- Batch the swap operations for the THP to reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help improve the performance of the THP swap.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which are usually 4k random
  IO.  This will improve the performance of the THP swap too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

This patchset is based on 04/04 head of mmotm/master.

This patchset is the first step for the THP swap support.  The plan is
to delay splitting THP step by step, finally avoid splitting THP
during the THP swapping out and swap out/in the THP as a whole.

As the first step, in this patchset, the splitting huge page is
delayed from almost the first step of swapping out to after allocating
the swap space for the THP and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache management.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 -- 
 %stddev %change %stddev
 \  |\  
   7043990 ±  0% +21.2%8536807 ±  0%  vm-scalability.throughput
109.94 ±  1% -16.2%  92.09 ±  0%  vm-scalability.time.elapsed_time
   3957091 ±  0% +14.9%4547173 ±  0%  vmstat.swap.so
 31.46 ±  1% -38.3%  19.42 ±  0%  perf-stat.cache-miss-rate%
  1.04 ±  1% +22.2%   1.27 ±  0%  perf-stat.ipc
  9.33 ±  2% -60.7%   3.67 ±  1%  
perf-profile.calltrace.cycles-pp.add_to_swap.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node

Changelog:

v8:

- Rebased on latest -mm tree
- Reorganize the patchset per Johannes' comments
- Merge add_to_swap_trans_huge() and add_to_swap() per Johannes' comments

v7:

- Rebased on latest -mm tree
- Revise get_swap_pages() THP support per Tim's comments

v6:

- Rebased on latest -mm tree (cluster lock, etc).
- Fix a potential uninitialized variable bug in __swap_entry_free()
- Revise the swap read-ahead changes to avoid a potential race
  condition between swap off and swap out in theory.

v5:

- Per Hillf's comments, fix a locking bug in error path of
  __add_to_swap_cache().  And merge the code to calculate extra_pins
  into can_split_huge_page().

v4:

- Per Johannes' comments, simplified swap cgroup array 

[PATCH -mm -v8 0/3] THP swap: Delay splitting THP during swapping out

2017-04-05 Thread Huang, Ying
From: Huang Ying 

This patchset is to optimize the performance of Transparent Huge Page
(THP) swap.

Hi, Andrew, could you help me to check whether the overall design is
reasonable?

Hi, Hugh, Shaohua, Minchan and Rik, could you help me to review the
swap part of the patchset?

Hi, Andrea could you help me to review the THP part of the patchset?

Hi, Johannes, Michal, I am not very confident about the memory cgroup
part.  Could you help me to review it?

And for all, Any comment is welcome!


Recently, the performance of the storage devices improved so fast that
we cannot saturate the disk bandwidth with single logical CPU when do
page swap out even on a high-end server machine.  Because the
performance of the storage device improved faster than that of single
logical CPU.  And it seems that the trend will not change in the near
future.  On the other hand, the THP becomes more and more popular
because of increased memory size.  So it becomes necessary to optimize
THP swap performance.

The advantages of the THP swap support include:

- Batch the swap operations for the THP to reduce lock
  acquiring/releasing, including allocating/freeing the swap space,
  adding/deleting to/from the swap cache, and writing/reading the swap
  space, etc.  This will help improve the performance of the THP swap.

- The THP swap space read/write will be 2M sequential IO.  It is
  particularly helpful for the swap read, which are usually 4k random
  IO.  This will improve the performance of the THP swap too.

- It will help the memory fragmentation, especially when the THP is
  heavily used by the applications.  The 2M continuous pages will be
  free up after THP swapping out.

- It will improve the THP utilization on the system with the swap
  turned on.  Because the speed for khugepaged to collapse the normal
  pages into the THP is quite slow.  After the THP is split during the
  swapping out, it will take quite long time for the normal pages to
  collapse back into the THP after being swapped in.  The high THP
  utilization helps the efficiency of the page based memory management
  too.

There are some concerns regarding THP swap in, mainly because possible
enlarged read/write IO size (for swap in/out) may put more overhead on
the storage device.  To deal with that, the THP swap in should be
turned on only when necessary.  For example, it can be selected via
"always/never/madvise" logic, to be turned on globally, turned off
globally, or turned on only for VMA with MADV_HUGEPAGE, etc.

This patchset is based on 04/04 head of mmotm/master.

This patchset is the first step for the THP swap support.  The plan is
to delay splitting THP step by step, finally avoid splitting THP
during the THP swapping out and swap out/in the THP as a whole.

As the first step, in this patchset, the splitting huge page is
delayed from almost the first step of swapping out to after allocating
the swap space for the THP and adding the THP into the swap cache.
This will reduce lock acquiring/releasing for the locks used for the
swap cache management.

With the patchset, the swap out throughput improves 14.9% (from about
3.77GB/s to about 4.34GB/s) in the vm-scalability swap-w-seq test case
with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
device used is a RAM simulated PMEM (persistent memory) device.  To
test the sequential swapping out, the test case creates 8 processes,
which sequentially allocate and write to the anonymous pages until the
RAM and part of the swap device is used up.

The detailed comparison result is as follow,

base base+patchset
 -- 
 %stddev %change %stddev
 \  |\  
   7043990 ±  0% +21.2%8536807 ±  0%  vm-scalability.throughput
109.94 ±  1% -16.2%  92.09 ±  0%  vm-scalability.time.elapsed_time
   3957091 ±  0% +14.9%4547173 ±  0%  vmstat.swap.so
 31.46 ±  1% -38.3%  19.42 ±  0%  perf-stat.cache-miss-rate%
  1.04 ±  1% +22.2%   1.27 ±  0%  perf-stat.ipc
  9.33 ±  2% -60.7%   3.67 ±  1%  
perf-profile.calltrace.cycles-pp.add_to_swap.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node

Changelog:

v8:

- Rebased on latest -mm tree
- Reorganize the patchset per Johannes' comments
- Merge add_to_swap_trans_huge() and add_to_swap() per Johannes' comments

v7:

- Rebased on latest -mm tree
- Revise get_swap_pages() THP support per Tim's comments

v6:

- Rebased on latest -mm tree (cluster lock, etc).
- Fix a potential uninitialized variable bug in __swap_entry_free()
- Revise the swap read-ahead changes to avoid a potential race
  condition between swap off and swap out in theory.

v5:

- Per Hillf's comments, fix a locking bug in error path of
  __add_to_swap_cache().  And merge the code to calculate extra_pins
  into can_split_huge_page().

v4:

- Per Johannes' comments, simplified swap cgroup array accessing code.
- Per 

Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem

2017-04-05 Thread Sagi Grimberg



Note that the nvme completion queues are still on the host memory, so
this means we have lost the ordering between data and completions as
they go to different pcie targets.


Hmm, in this simple up/down case with a switch, I think it might
actually be OK.

Transactions might not complete at the NVMe device before the CPU
processes the RDMA completion, however due to the PCI-E ordering rules
new TLPs directed to the NVMe will complete after the RMDA TLPs and
thus observe the new data. (eg order preserving)

It would be very hard to use P2P if fabric ordering is not preserved..


I think it still can race if the p2p device is connected with more than
a single port to the switch.

Say it's connected via 2 legs, the bar is accessed from leg A and the
data from the disk comes via leg B. In this case, the data is heading
towards the p2p device via leg B (might be congested), the completion
goes directly to the RC, and then the host issues a read from the
bar via leg A. I don't understand what can guarantee ordering here.

Stephen told me that this still guarantees ordering, but I honestly
can't understand how, perhaps someone can explain to me in a simple
way that I can understand.


Re: [RFC 6/8] nvmet: Be careful about using iomem accesses when dealing with p2pmem

2017-04-05 Thread Sagi Grimberg



Note that the nvme completion queues are still on the host memory, so
this means we have lost the ordering between data and completions as
they go to different pcie targets.


Hmm, in this simple up/down case with a switch, I think it might
actually be OK.

Transactions might not complete at the NVMe device before the CPU
processes the RDMA completion, however due to the PCI-E ordering rules
new TLPs directed to the NVMe will complete after the RMDA TLPs and
thus observe the new data. (eg order preserving)

It would be very hard to use P2P if fabric ordering is not preserved..


I think it still can race if the p2p device is connected with more than
a single port to the switch.

Say it's connected via 2 legs, the bar is accessed from leg A and the
data from the disk comes via leg B. In this case, the data is heading
towards the p2p device via leg B (might be congested), the completion
goes directly to the RC, and then the host issues a read from the
bar via leg A. I don't understand what can guarantee ordering here.

Stephen told me that this still guarantees ordering, but I honestly
can't understand how, perhaps someone can explain to me in a simple
way that I can understand.


Re: [PATCH net-next] macb: Add hardware PTP support.

2017-04-05 Thread Richard Cochran
On Wed, Apr 05, 2017 at 06:43:03AM -0700, David Miller wrote:
> This patch does too many things at one time.  Each entry in that list
> of changes above should be a separate change, all posted together as
> a group as a proper patch series.

And please start a new thread with the next posting.

Thanks,
Richard


Re: [PATCH net-next] macb: Add hardware PTP support.

2017-04-05 Thread Richard Cochran
On Wed, Apr 05, 2017 at 06:43:03AM -0700, David Miller wrote:
> This patch does too many things at one time.  Each entry in that list
> of changes above should be a separate change, all posted together as
> a group as a proper patch series.

And please start a new thread with the next posting.

Thanks,
Richard


Re: [PATCH v6 01/23] PCI: endpoint: Add EP core layer to enable EP controller and EP functions

2017-04-05 Thread Kishon Vijay Abraham I
Hi Bjorn,

On Wednesday 05 April 2017 10:22 PM, Bjorn Helgaas wrote:
> On Wed, Apr 05, 2017 at 02:22:21PM +0530, Kishon Vijay Abraham I wrote:
>> Introduce a new EP core layer in order to support endpoint functions in
>> linux kernel. This comprises the EPC library (Endpoint Controller Library)
>> and EPF library (Endpoint Function Library). EPC library implements
>> functions specific to an endpoint controller and EPF library implements
>> functions specific to an endpoint function.
>> ...
> 
>> +/**
>> + * pci_epf_linkup() - Notify the function driver that EPC device has
>> + *established a connection with the Root Complex.
>> + * @epf: the EPF device bound to the EPC device which has established
>> + *   the connection with the host
>> + *
>> + * Invoke to notify the function driver that EPC device has established
>> + * a connection with the Root Complex.
>> + */
>> +void pci_epf_linkup(struct pci_epf *epf)
>> +{
>> +if (!epf->driver)
>> +dev_WARN(>dev, "epf device not bound to driver\n");
>> +
>> +epf->driver->ops->linkup(epf);
> 
> I don't understand what's going on here.  We warn if epf->driver is
> NULL, but the next thing we do is dereference it.
> 
> For NULL pointers that are symptoms of Linux defects, I usually prefer
> not to check at all so that a dereference generates an oops and we can
> debug the problem.  For NULL pointers caused by user error, we would
> generally return an error that percolates up to the user.
> 
> I haven't competely wrapped my head around this endpoint support, but
> I assume a NULL pointer here would be caused by user error, not
> necessarily a Linux defect.  So why would we dereference a NULL
> pointer?  And what happens when we do?  Is this just going to oops an
> embedded Linux running inside the endpoint?  Is that the correct
> behavior?

With the new configfs directory structure, this should be a kernel error.
However the EPF layer should be independent of how it's API's are used i.e
someone can create a new sysfs/configfs structure and the value of epf->driver
might be dependent on user actions.

I think I'd prefer not to dereference NULL pointers since we anyways have a
dev_WARN for debug. I'll resend this patch with return if epf->driver is NULL.

Thanks
Kishon


Re: [PATCH v6 01/23] PCI: endpoint: Add EP core layer to enable EP controller and EP functions

2017-04-05 Thread Kishon Vijay Abraham I
Hi Bjorn,

On Wednesday 05 April 2017 10:22 PM, Bjorn Helgaas wrote:
> On Wed, Apr 05, 2017 at 02:22:21PM +0530, Kishon Vijay Abraham I wrote:
>> Introduce a new EP core layer in order to support endpoint functions in
>> linux kernel. This comprises the EPC library (Endpoint Controller Library)
>> and EPF library (Endpoint Function Library). EPC library implements
>> functions specific to an endpoint controller and EPF library implements
>> functions specific to an endpoint function.
>> ...
> 
>> +/**
>> + * pci_epf_linkup() - Notify the function driver that EPC device has
>> + *established a connection with the Root Complex.
>> + * @epf: the EPF device bound to the EPC device which has established
>> + *   the connection with the host
>> + *
>> + * Invoke to notify the function driver that EPC device has established
>> + * a connection with the Root Complex.
>> + */
>> +void pci_epf_linkup(struct pci_epf *epf)
>> +{
>> +if (!epf->driver)
>> +dev_WARN(>dev, "epf device not bound to driver\n");
>> +
>> +epf->driver->ops->linkup(epf);
> 
> I don't understand what's going on here.  We warn if epf->driver is
> NULL, but the next thing we do is dereference it.
> 
> For NULL pointers that are symptoms of Linux defects, I usually prefer
> not to check at all so that a dereference generates an oops and we can
> debug the problem.  For NULL pointers caused by user error, we would
> generally return an error that percolates up to the user.
> 
> I haven't competely wrapped my head around this endpoint support, but
> I assume a NULL pointer here would be caused by user error, not
> necessarily a Linux defect.  So why would we dereference a NULL
> pointer?  And what happens when we do?  Is this just going to oops an
> embedded Linux running inside the endpoint?  Is that the correct
> behavior?

With the new configfs directory structure, this should be a kernel error.
However the EPF layer should be independent of how it's API's are used i.e
someone can create a new sysfs/configfs structure and the value of epf->driver
might be dependent on user actions.

I think I'd prefer not to dereference NULL pointers since we anyways have a
dev_WARN for debug. I'll resend this patch with return if epf->driver is NULL.

Thanks
Kishon


Re: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3

2017-04-05 Thread Srinivas Pandruvada

On Thu, 2017-04-06 at 04:58 +, Song, Hongyan wrote:
> Hi Srinivas,
>   I have checked the patch dose not meets my requirement for ISH.
> With this patch sensor properties still losing after resume from S3.
What is your test case? I want to try.

Thanks,
Srinivas

> 
> BR
> Song Hongyan
> 
> -Original Message-
> From: Srinivas Pandruvada [mailto:srinivas.pandruv...@linux.intel.com
> ] 
> Sent: Wednesday, April 5, 2017 11:36 PM
> To: r...@researchut.com; Song, Hongyan ;
> linux-iio 
> Cc: sta...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get
> poll value function order to avoid sensor properties losing after
> resume from S3
> 
> Hi Hongyan,
> 
> Can you check if the patch meets your requirement/needs for ISH?
> 
> Thanks,
> Srinivas
> 
> On Wed, 2017-04-05 at 16:21 +0530, Ritesh Raj Sarraf wrote:
> > 
> > On Tue, 2017-04-04 at 17:44 -0700, Srinivas Pandruvada wrote:
> > > 
> > > 
> > > Hi Ritesh,
> > > 
> > > Does the attached patch helps?
> > Thank you Srinivas. I tested your patch on top of 4.10.8 and it is 
> > working perfect.
> > 
> > Ritesh
> > 


Re: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3

2017-04-05 Thread Srinivas Pandruvada

On Thu, 2017-04-06 at 04:58 +, Song, Hongyan wrote:
> Hi Srinivas,
>   I have checked the patch dose not meets my requirement for ISH.
> With this patch sensor properties still losing after resume from S3.
What is your test case? I want to try.

Thanks,
Srinivas

> 
> BR
> Song Hongyan
> 
> -Original Message-
> From: Srinivas Pandruvada [mailto:srinivas.pandruv...@linux.intel.com
> ] 
> Sent: Wednesday, April 5, 2017 11:36 PM
> To: r...@researchut.com; Song, Hongyan ;
> linux-iio 
> Cc: sta...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get
> poll value function order to avoid sensor properties losing after
> resume from S3
> 
> Hi Hongyan,
> 
> Can you check if the patch meets your requirement/needs for ISH?
> 
> Thanks,
> Srinivas
> 
> On Wed, 2017-04-05 at 16:21 +0530, Ritesh Raj Sarraf wrote:
> > 
> > On Tue, 2017-04-04 at 17:44 -0700, Srinivas Pandruvada wrote:
> > > 
> > > 
> > > Hi Ritesh,
> > > 
> > > Does the attached patch helps?
> > Thank you Srinivas. I tested your patch on top of 4.10.8 and it is 
> > working perfect.
> > 
> > Ritesh
> > 


Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

2017-04-05 Thread NeilBrown
On Wed, Apr 05 2017, Matthew Wilcox wrote:

> On Thu, Apr 06, 2017 at 10:02:48AM +1000, NeilBrown wrote:
>> If you are concerned about space in 'struct address_space', just prune
>> some wastage.
>
> I'm trying to (via wlists).  still buggy though.

Cool.
(I wonder what a wlist is weighted list?)

>
>> The "host" field brings no value.  It is only ever assigned in
>> inode_init_always():
>> 
>>  struct address_space *const mapping = >i_data;
>> ..
>>  mapping->host = inode;
>> 
>> So you could change all references to use
>>container_of(mapping, struct inode, i_data)
>
> Alas, no:
>
> drivers/dax/dax.c:  inode->i_mapping->host = dax_dev->inode;

inode->i_mapping = dax_dev->inode->i_mapping;
inode->i_mapping->host = dax_dev->inode;
so that second line is equivalent to
dax_dev->inode->i_mapping->host = dax_dev->inode;
so inode->mapping->host leads back to inode.  So this doesn't break the
invariant.
   

> fs/gfs2/glock.c:mapping->host = s->s_bdev->bd_inode;
> fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;

Hmm.. that's weird.  I cannot quite follow what is happening there.
It creates an address-space for metadata which doesn't have a real
inode, and borrows bits of the bdev inode ... possibly just to be able
to find the blocksize deep in buffer.c or similar.
I suspect that using an 'inode' instead of a 'mapping' would make the
code clearer.

> fs/nilfs2/page.c:   mapping->host = inode;

A nilfs inode is allocated with 2 address spaces,
one for the data and one for btree indexing metadata.
And then there are a couple of extra address spaces for the
global metadata-file (mtd).

I wonder what the ->host pointer is actually used for.
buffer.c uses it:
 - to mark the inode 'dirty' when the page is marked dirty
 - to find the blocksize of the inode, for creating buffer_heads
 - find the size of the mapping (i_size)

I could probably argue that the 'dirty' flag (at least for the data) and
the size really belong in the address_space, not in the inode.
The blocksize, I'm less sure of.

I suspect gfs2 and nilfs2 could be changed to allocate a separate inode
(instead of address_space), or to not make use of the ->host pointer.
It would be more work than I at first thought though.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

2017-04-05 Thread NeilBrown
On Wed, Apr 05 2017, Matthew Wilcox wrote:

> On Thu, Apr 06, 2017 at 10:02:48AM +1000, NeilBrown wrote:
>> If you are concerned about space in 'struct address_space', just prune
>> some wastage.
>
> I'm trying to (via wlists).  still buggy though.

Cool.
(I wonder what a wlist is weighted list?)

>
>> The "host" field brings no value.  It is only ever assigned in
>> inode_init_always():
>> 
>>  struct address_space *const mapping = >i_data;
>> ..
>>  mapping->host = inode;
>> 
>> So you could change all references to use
>>container_of(mapping, struct inode, i_data)
>
> Alas, no:
>
> drivers/dax/dax.c:  inode->i_mapping->host = dax_dev->inode;

inode->i_mapping = dax_dev->inode->i_mapping;
inode->i_mapping->host = dax_dev->inode;
so that second line is equivalent to
dax_dev->inode->i_mapping->host = dax_dev->inode;
so inode->mapping->host leads back to inode.  So this doesn't break the
invariant.
   

> fs/gfs2/glock.c:mapping->host = s->s_bdev->bd_inode;
> fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;

Hmm.. that's weird.  I cannot quite follow what is happening there.
It creates an address-space for metadata which doesn't have a real
inode, and borrows bits of the bdev inode ... possibly just to be able
to find the blocksize deep in buffer.c or similar.
I suspect that using an 'inode' instead of a 'mapping' would make the
code clearer.

> fs/nilfs2/page.c:   mapping->host = inode;

A nilfs inode is allocated with 2 address spaces,
one for the data and one for btree indexing metadata.
And then there are a couple of extra address spaces for the
global metadata-file (mtd).

I wonder what the ->host pointer is actually used for.
buffer.c uses it:
 - to mark the inode 'dirty' when the page is marked dirty
 - to find the blocksize of the inode, for creating buffer_heads
 - find the size of the mapping (i_size)

I could probably argue that the 'dirty' flag (at least for the data) and
the size really belong in the address_space, not in the inode.
The blocksize, I'm less sure of.

I suspect gfs2 and nilfs2 could be changed to allocate a separate inode
(instead of address_space), or to not make use of the ->host pointer.
It would be more work than I at first thought though.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [HMM 00/16] HMM (Heterogeneous Memory Management) v19

2017-04-05 Thread Jerome Glisse
On Thu, Apr 06, 2017 at 11:22:12AM +0800, Figo.zhang wrote:

[...]

> > Heterogeneous Memory Management (HMM) (description and justification)
> >
> > Today device driver expose dedicated memory allocation API through their
> > device file, often relying on a combination of IOCTL and mmap calls. The
> > device can only access and use memory allocated through this API. This
> > effectively split the program address space into object allocated for the
> > device and useable by the device and other regular memory (malloc, mmap
> > of a file, share memory, …) only accessible by CPU (or in a very limited
> > way by a device by pinning memory).
> >
> > Allowing different isolated component of a program to use a device thus
> > require duplication of the input data structure using device memory
> > allocator. This is reasonable for simple data structure (array, grid,
> > image, …) but this get extremely complex with advance data structure
> > (list, tree, graph, …) that rely on a web of memory pointers. This is
> > becoming a serious limitation on the kind of work load that can be
> > offloaded to device like GPU.
> >
> 
> how handle it by current  GPU software stack? maintain a complex middle
> firmwork/HAL?

Yes you still need a framework like OpenCL or CUDA. They are work under
way to leverage GPU directly from language like C++, so i expect that
the HAL will be hidden more and more for a larger group of programmer.
Note i still expect some programmer will want to program closer to the
hardware to extract every bit of performances they can.

For OpenCL you need HMM to implement what is described as fine-grained
system SVM memory model (see OpenCL 2.0 or latter specification).

> > New industry standard like C++, OpenCL or CUDA are pushing to remove this
> > barrier. This require a shared address space between GPU device and CPU so
> > that GPU can access any memory of a process (while still obeying memory
> > protection like read only).
> 
> GPU can access the whole process VMAs or any VMAs which backing system
> memory has migrate to GPU page table?

Whole process VMAs, it does not need to be migrated to device memory. The
migration is an optional features that is necessary for performances but
GPU can access system memory just fine.

[...]

> > When page backing an address of a process is migrated to device memory
> > the CPU page table entry is set to a new specific swap entry. CPU access
> > to such address triggers a migration back to system memory, just like if
> > the page was swap on disk. HMM also blocks any one from pinning a
> > ZONE_DEVICE page so that it can always be migrated back to system memory
> > if CPU access it. Conversely HMM does not migrate to device memory any
> > page that is pin in system memory.
> >
> 
> the purpose of  migrate the system pages to device is that device can read
> the system memory?
> if the CPU/programs want read the device data, it need pin/mapping the
> device memory to the process address space?
> if multiple applications want to read the same device memory region
> concurrently, how to do it?

Purpose of migrating to device memory is to leverage device memory bandwidth.
PCIE bandwidth 32GB/s, device memory bandwidth between 256GB/s to 1TB/s also
device bandwidth has smaller latency.

CPU can not access device memory. It can but in limited way on PCIE and it
would violate memory model programmer get for regular system memory hence
for all intents and purposes it is better to say that CPU can not access
any of the device memory.

Share VMA will just work, so if a VMA is share between 2 process than both
process can access the same memory. All the semantics that are valid on the
CPU are also valid on the GPU. Nothing change there.


> it is better a graph to show how CPU and GPU share the address space.

I am not good at making ASCII graph, nor would i know how to graph this.
Any valid address on the CPU is valid on the GPU, that's it really. The
migration to device memory is orthogonal to the share address space.

Cheers,
Jérôme


Re: [HMM 00/16] HMM (Heterogeneous Memory Management) v19

2017-04-05 Thread Jerome Glisse
On Thu, Apr 06, 2017 at 11:22:12AM +0800, Figo.zhang wrote:

[...]

> > Heterogeneous Memory Management (HMM) (description and justification)
> >
> > Today device driver expose dedicated memory allocation API through their
> > device file, often relying on a combination of IOCTL and mmap calls. The
> > device can only access and use memory allocated through this API. This
> > effectively split the program address space into object allocated for the
> > device and useable by the device and other regular memory (malloc, mmap
> > of a file, share memory, …) only accessible by CPU (or in a very limited
> > way by a device by pinning memory).
> >
> > Allowing different isolated component of a program to use a device thus
> > require duplication of the input data structure using device memory
> > allocator. This is reasonable for simple data structure (array, grid,
> > image, …) but this get extremely complex with advance data structure
> > (list, tree, graph, …) that rely on a web of memory pointers. This is
> > becoming a serious limitation on the kind of work load that can be
> > offloaded to device like GPU.
> >
> 
> how handle it by current  GPU software stack? maintain a complex middle
> firmwork/HAL?

Yes you still need a framework like OpenCL or CUDA. They are work under
way to leverage GPU directly from language like C++, so i expect that
the HAL will be hidden more and more for a larger group of programmer.
Note i still expect some programmer will want to program closer to the
hardware to extract every bit of performances they can.

For OpenCL you need HMM to implement what is described as fine-grained
system SVM memory model (see OpenCL 2.0 or latter specification).

> > New industry standard like C++, OpenCL or CUDA are pushing to remove this
> > barrier. This require a shared address space between GPU device and CPU so
> > that GPU can access any memory of a process (while still obeying memory
> > protection like read only).
> 
> GPU can access the whole process VMAs or any VMAs which backing system
> memory has migrate to GPU page table?

Whole process VMAs, it does not need to be migrated to device memory. The
migration is an optional features that is necessary for performances but
GPU can access system memory just fine.

[...]

> > When page backing an address of a process is migrated to device memory
> > the CPU page table entry is set to a new specific swap entry. CPU access
> > to such address triggers a migration back to system memory, just like if
> > the page was swap on disk. HMM also blocks any one from pinning a
> > ZONE_DEVICE page so that it can always be migrated back to system memory
> > if CPU access it. Conversely HMM does not migrate to device memory any
> > page that is pin in system memory.
> >
> 
> the purpose of  migrate the system pages to device is that device can read
> the system memory?
> if the CPU/programs want read the device data, it need pin/mapping the
> device memory to the process address space?
> if multiple applications want to read the same device memory region
> concurrently, how to do it?

Purpose of migrating to device memory is to leverage device memory bandwidth.
PCIE bandwidth 32GB/s, device memory bandwidth between 256GB/s to 1TB/s also
device bandwidth has smaller latency.

CPU can not access device memory. It can but in limited way on PCIE and it
would violate memory model programmer get for regular system memory hence
for all intents and purposes it is better to say that CPU can not access
any of the device memory.

Share VMA will just work, so if a VMA is share between 2 process than both
process can access the same memory. All the semantics that are valid on the
CPU are also valid on the GPU. Nothing change there.


> it is better a graph to show how CPU and GPU share the address space.

I am not good at making ASCII graph, nor would i know how to graph this.
Any valid address on the CPU is valid on the GPU, that's it really. The
migration to device memory is orthogonal to the share address space.

Cheers,
Jérôme


RE: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3

2017-04-05 Thread Song, Hongyan
Hi Srinivas,
I have checked the patch dose not meets my requirement for ISH.
With this patch sensor properties still losing after resume from S3.

BR
Song Hongyan

-Original Message-
From: Srinivas Pandruvada [mailto:srinivas.pandruv...@linux.intel.com] 
Sent: Wednesday, April 5, 2017 11:36 PM
To: r...@researchut.com; Song, Hongyan ; linux-iio 

Cc: sta...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get poll value 
function order to avoid sensor properties losing after resume from S3

Hi Hongyan,

Can you check if the patch meets your requirement/needs for ISH?

Thanks,
Srinivas

On Wed, 2017-04-05 at 16:21 +0530, Ritesh Raj Sarraf wrote:
> On Tue, 2017-04-04 at 17:44 -0700, Srinivas Pandruvada wrote:
> > 
> > Hi Ritesh,
> > 
> > Does the attached patch helps?
> 
> Thank you Srinivas. I tested your patch on top of 4.10.8 and it is 
> working perfect.
> 
> Ritesh
> 


RE: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3

2017-04-05 Thread Song, Hongyan
Hi Srinivas,
I have checked the patch dose not meets my requirement for ISH.
With this patch sensor properties still losing after resume from S3.

BR
Song Hongyan

-Original Message-
From: Srinivas Pandruvada [mailto:srinivas.pandruv...@linux.intel.com] 
Sent: Wednesday, April 5, 2017 11:36 PM
To: r...@researchut.com; Song, Hongyan ; linux-iio 

Cc: sta...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [STABLE REGRESSION] iio: hid-sensor-trigger: Change get poll value 
function order to avoid sensor properties losing after resume from S3

Hi Hongyan,

Can you check if the patch meets your requirement/needs for ISH?

Thanks,
Srinivas

On Wed, 2017-04-05 at 16:21 +0530, Ritesh Raj Sarraf wrote:
> On Tue, 2017-04-04 at 17:44 -0700, Srinivas Pandruvada wrote:
> > 
> > Hi Ritesh,
> > 
> > Does the attached patch helps?
> 
> Thank you Srinivas. I tested your patch on top of 4.10.8 and it is 
> working perfect.
> 
> Ritesh
> 


Re: [PATCH v9 3/3] printk: fix double printing with earlycon

2017-04-05 Thread Aleksey Makarov



On 04/06/2017 12:57 AM, Andy Shevchenko wrote:

On Wed, Apr 5, 2017 at 11:20 PM, Aleksey Makarov
 wrote:

If a console was specified by ACPI SPCR table _and_ command line
parameters like "console=ttyAMA0" _and_ "earlycon" were specified,
then log messages appear twice.

The root cause is that the code traverses the list of specified
consoles (the `console_cmdline` array) and stops at the first match.
But it may happen that the same console is referred by the elements
of this array twice:

pl011,mmio,0x87e02400,115200 -- from SPCR
ttyAMA0 -- from command line

but in this case `preferred_console` points to the second entry and
the flag CON_CONSDEV is not set, so bootconsole is not deregistered.

To fix that, introduce an invariant "The last non-braille console
is always the preferred one" on the entries of the console_cmdline
array.  Then traverse it in reverse order to be sure that if
the console is preferred then it will be the first matching entry.
Introduce variable console_cmdline_cnt that keeps the number
of elements of the console_cmdline array (Petr Mladek).  It helps
to get rid of the loop that searches for the end of this array.



 #define MAX_CMDLINECONSOLES 8

 static struct console_cmdline console_cmdline[MAX_CMDLINECONSOLES];
+static int console_cmdline_cnt;


This should be equal to -1 at the beginning, am I right?


No, this is not an index of the last element, this is count of
elements of cmdline_console array.  So it is 0 initially.


 static int preferred_console = -1;
 int console_set_on_cmdline;
@@ -1905,12 +1906,26 @@ static int __add_preferred_console(char *name, int idx, 
char *options,
 *  See if this tty is not yet registered, and
 *  if we have a slot free.
 */
-   for (i = 0, c = console_cmdline;
-i < MAX_CMDLINECONSOLES && c->name[0];
-i++, c++) {
+   for (i = 0, c = console_cmdline; i < console_cmdline_cnt; i++, c++) {
if (strcmp(c->name, name) == 0 && c->index == idx) {
-   if (!brl_options)
-   preferred_console = i;
+



+   if (brl_options)
+   return 0;


Is it invariant or brl_options may appear while looping?


I am not sure I understand your question.
If we find that we are registering a braille console that
has already been registered, we just return without updating
preferred console (it is only about regular consoles) and
without swapping it with the last element of the array (because it
is explicitly mentioned in the invariant:  The last
*non-braille* console is always the preferred one)


+
+   /*
+* Maintain an invariant that will help to find if
+* the matching console is preferred, see
+* register_console():
+*
+* The last non-braille console is always
+* the preferred one.
+*/
+   if (i != console_cmdline_cnt - 1)
+   swap(console_cmdline[i],
+console_cmdline[console_cmdline_cnt - 1]);


i'm wondering if you can iterate from the end to beginning as you do below.
It would simplify things.


You mean iterate to find the last element?
Yes I can and it is how this was implemented in v8,
Petr Mladek asked to introduce console_cmdline_cnt.

Thank you for review
Aleksey Makarov


+
+   preferred_console = console_cmdline_cnt - 1;
+
return 0;
}
}
@@ -1923,6 +1938,7 @@ static int __add_preferred_console(char *name, int idx, 
char *options,
braille_set_options(c, brl_options);

c->index = idx;
+   console_cmdline_cnt++;
return 0;
 }
 /*
@@ -2457,12 +2473,24 @@ void register_console(struct console *newcon)
}

/*
-*  See if this console matches one we selected on
-*  the command line.
+* See if this console matches one we selected on the command line.
+*
+* There may be several entries in the console_cmdline array matching
+* with the same console, one with newcon->match(), another by
+* name/index:
+*
+*  pl011,mmio,0x87e02400,115200 -- added from SPCR
+*  ttyAMA0 -- added from command line
+*
+* Traverse the console_cmdline array in reverse order to be
+* sure that if this console is preferred then it will be the first
+* matching entry.  We use the invariant that is maintained in
+* __add_preferred_console().
 */
-   for (i = 0, c = console_cmdline;
-i < MAX_CMDLINECONSOLES && c->name[0];
-i++, c++) {



+   for (i = console_cmdline_cnt - 1; i >= 0; i--) {





+
+   c = 

Re: [PATCH v9 3/3] printk: fix double printing with earlycon

2017-04-05 Thread Aleksey Makarov



On 04/06/2017 12:57 AM, Andy Shevchenko wrote:

On Wed, Apr 5, 2017 at 11:20 PM, Aleksey Makarov
 wrote:

If a console was specified by ACPI SPCR table _and_ command line
parameters like "console=ttyAMA0" _and_ "earlycon" were specified,
then log messages appear twice.

The root cause is that the code traverses the list of specified
consoles (the `console_cmdline` array) and stops at the first match.
But it may happen that the same console is referred by the elements
of this array twice:

pl011,mmio,0x87e02400,115200 -- from SPCR
ttyAMA0 -- from command line

but in this case `preferred_console` points to the second entry and
the flag CON_CONSDEV is not set, so bootconsole is not deregistered.

To fix that, introduce an invariant "The last non-braille console
is always the preferred one" on the entries of the console_cmdline
array.  Then traverse it in reverse order to be sure that if
the console is preferred then it will be the first matching entry.
Introduce variable console_cmdline_cnt that keeps the number
of elements of the console_cmdline array (Petr Mladek).  It helps
to get rid of the loop that searches for the end of this array.



 #define MAX_CMDLINECONSOLES 8

 static struct console_cmdline console_cmdline[MAX_CMDLINECONSOLES];
+static int console_cmdline_cnt;


This should be equal to -1 at the beginning, am I right?


No, this is not an index of the last element, this is count of
elements of cmdline_console array.  So it is 0 initially.


 static int preferred_console = -1;
 int console_set_on_cmdline;
@@ -1905,12 +1906,26 @@ static int __add_preferred_console(char *name, int idx, 
char *options,
 *  See if this tty is not yet registered, and
 *  if we have a slot free.
 */
-   for (i = 0, c = console_cmdline;
-i < MAX_CMDLINECONSOLES && c->name[0];
-i++, c++) {
+   for (i = 0, c = console_cmdline; i < console_cmdline_cnt; i++, c++) {
if (strcmp(c->name, name) == 0 && c->index == idx) {
-   if (!brl_options)
-   preferred_console = i;
+



+   if (brl_options)
+   return 0;


Is it invariant or brl_options may appear while looping?


I am not sure I understand your question.
If we find that we are registering a braille console that
has already been registered, we just return without updating
preferred console (it is only about regular consoles) and
without swapping it with the last element of the array (because it
is explicitly mentioned in the invariant:  The last
*non-braille* console is always the preferred one)


+
+   /*
+* Maintain an invariant that will help to find if
+* the matching console is preferred, see
+* register_console():
+*
+* The last non-braille console is always
+* the preferred one.
+*/
+   if (i != console_cmdline_cnt - 1)
+   swap(console_cmdline[i],
+console_cmdline[console_cmdline_cnt - 1]);


i'm wondering if you can iterate from the end to beginning as you do below.
It would simplify things.


You mean iterate to find the last element?
Yes I can and it is how this was implemented in v8,
Petr Mladek asked to introduce console_cmdline_cnt.

Thank you for review
Aleksey Makarov


+
+   preferred_console = console_cmdline_cnt - 1;
+
return 0;
}
}
@@ -1923,6 +1938,7 @@ static int __add_preferred_console(char *name, int idx, 
char *options,
braille_set_options(c, brl_options);

c->index = idx;
+   console_cmdline_cnt++;
return 0;
 }
 /*
@@ -2457,12 +2473,24 @@ void register_console(struct console *newcon)
}

/*
-*  See if this console matches one we selected on
-*  the command line.
+* See if this console matches one we selected on the command line.
+*
+* There may be several entries in the console_cmdline array matching
+* with the same console, one with newcon->match(), another by
+* name/index:
+*
+*  pl011,mmio,0x87e02400,115200 -- added from SPCR
+*  ttyAMA0 -- added from command line
+*
+* Traverse the console_cmdline array in reverse order to be
+* sure that if this console is preferred then it will be the first
+* matching entry.  We use the invariant that is maintained in
+* __add_preferred_console().
 */
-   for (i = 0, c = console_cmdline;
-i < MAX_CMDLINECONSOLES && c->name[0];
-i++, c++) {



+   for (i = console_cmdline_cnt - 1; i >= 0; i--) {





+
+   c = console_cmdline + i;
+
   

Re: [PATCH v3 1/2] soc: qcom: Add support of scm call for mss rproc to share access of ddr

2017-04-05 Thread Bjorn Andersson
On Wed 08 Mar 10:03 PST 2017, Avaneesh Kumar Dwivedi wrote:

> This patch add scm call support to make hypervisor call to enable access
> of fw regions in ddr to mss subsystem on arm-v8 arch soc's.
> 
> Signed-off-by: Avaneesh Kumar Dwivedi 
> ---
>  drivers/firmware/qcom_scm-64.c |  25 +++
>  drivers/firmware/qcom_scm.c|  93 ++
>  drivers/firmware/qcom_scm.h|   3 +
>  drivers/remoteproc/qcom_q6v5_pil.c | 129 
> -
>  include/linux/qcom_scm.h   |  14 
>  5 files changed, 262 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
> index 4a0f5ea..187fc00 100644
> --- a/drivers/firmware/qcom_scm-64.c
> +++ b/drivers/firmware/qcom_scm-64.c
> @@ -358,3 +358,28 @@ int __qcom_scm_pas_mss_reset(struct device *dev, bool 
> reset)
>  
>   return ret ? : res.a1;
>  }
> +
> +int __qcom_scm_assign_mem(struct device *dev, struct vmid_detail vmid)

Rather than packing these parameters up in a struct I think it's cleaner
to just pass them directly.

> +{
> + int ret;
> + struct qcom_scm_desc desc = {0};
> + struct arm_smccc_res res;
> +
> + desc.args[0] = vmid.fw_phy;
> + desc.args[1] = vmid.size_fw;
> + desc.args[2] = vmid.from_phy;
> + desc.args[3] = vmid.size_from;
> + desc.args[4] = vmid.to_phy;
> + desc.args[5] = vmid.size_to;
> + desc.args[6] = 0;
> +
> + desc.arginfo = QCOM_SCM_ARGS(7, QCOM_SCM_RO, QCOM_SCM_VAL,
> + QCOM_SCM_RO, QCOM_SCM_VAL, QCOM_SCM_RO,
> + QCOM_SCM_VAL, QCOM_SCM_VAL);
> +
> + ret = qcom_scm_call(dev, QCOM_SCM_SVC_MP,
> + QCOM_MEM_PROT_ASSIGN_ID,
> + , );
> +
> + return ret ? : res.a1;

If I understand the downstream code we only care about "ret" here; being
zero on success or negative on error.

> +}
> diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
> index 893f953ea..f137f34 100644
> --- a/drivers/firmware/qcom_scm.c
> +++ b/drivers/firmware/qcom_scm.c
> @@ -42,6 +42,18 @@ struct qcom_scm {
>  
>  static struct qcom_scm *__scm;
>  
> +struct dest_vm_and_perm_info {
> + __le32 vm;
> + __le32 perm;
> + __le32 *ctx;

Please be explicit about the fact that this is 64 bit.

> + __le32 ctx_size;
> +};

This should be __packed

> +
> +struct fw_region_info {
> + __le64 addr;
> + __le64 size;
> +};
> +
>  static int qcom_scm_clk_enable(void)
>  {
>   int ret;
> @@ -292,6 +304,87 @@ int qcom_scm_pas_shutdown(u32 peripheral)
>  }
>  EXPORT_SYMBOL(qcom_scm_pas_shutdown);
>  
> +/**
> + * qcom_scm_assign_mem() - Allocate and fill vmid detail of old
> + * new owners of memory region for fw and metadata etc, Which is
> + * further passed to hypervisor, which does translate intermediate
> + * physical address used by subsystems.
> + * @vmid: structure with pointers and size detail of old and new
> + * owners vmid detail.
> + * Return 0 on success.
> + */
> +int qcom_scm_assign_mem(struct vmid_detail vmid)

After a long discussion with Stephen I now think that I understand
what's actually going on here.

So this function will request TZ to remove all permissions for the
memory region in the tables specified in "from" and then for each vmid
in "to" it will set up the specified "permission".

So the "to" and "permissions" are actually a tuple, rather than
independent lists of values. So I think this should be exposed in the
prototype, as a list of  entries.

To make the function prototype more convenient I think you should turn
"from" into a bitmap (e.g. BIT(VMID_HLOS) | BIT(VMID_MSS_MSA)).

If you then make the function, on success, return "to" as a bitmap the
caller can simply store that in a state variable and pass it as "from"
in the next call.

So you would have:

  struct qcom_scm_mem_perm new_perms[] = {
{ VMID_HLOS, PERM_READ },   
{ VMID_MSS_MSA, PREM_READ | PERM_WRITE },
  };
  
  current_perms = qcom_scm_assign_mem(ptr, size, current_perms, new_perms, 2);


And I believe something like "curr_perm" and "new_perm" are even better
names than "from" and "to"

> +{
> + unsigned long dma_attrs = DMA_ATTR_FORCE_CONTIGUOUS;
> + struct dest_vm_and_perm_info *to;
> + struct fw_region_info *fw_info;
> + __le64 fw_phy;
> + __le32 *from;
> + int ret = -ENOMEM;
> + int i;
> +
> + from = dma_alloc_attrs(__scm->dev, vmid.size_from,
> + _phy, GFP_KERNEL, dma_attrs);
> + if (!from) {
> + dev_err(__scm->dev,
> + "failed to allocate buffer to pass source vmid 
> detail\n");
> + return -ENOMEM;
> + }
> + to = dma_alloc_attrs(__scm->dev, vmid.size_to,
> + _phy, GFP_KERNEL, dma_attrs);
> + if (!to) {
> + dev_err(__scm->dev,
> + 

Re: [PATCH v3 1/2] soc: qcom: Add support of scm call for mss rproc to share access of ddr

2017-04-05 Thread Bjorn Andersson
On Wed 08 Mar 10:03 PST 2017, Avaneesh Kumar Dwivedi wrote:

> This patch add scm call support to make hypervisor call to enable access
> of fw regions in ddr to mss subsystem on arm-v8 arch soc's.
> 
> Signed-off-by: Avaneesh Kumar Dwivedi 
> ---
>  drivers/firmware/qcom_scm-64.c |  25 +++
>  drivers/firmware/qcom_scm.c|  93 ++
>  drivers/firmware/qcom_scm.h|   3 +
>  drivers/remoteproc/qcom_q6v5_pil.c | 129 
> -
>  include/linux/qcom_scm.h   |  14 
>  5 files changed, 262 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
> index 4a0f5ea..187fc00 100644
> --- a/drivers/firmware/qcom_scm-64.c
> +++ b/drivers/firmware/qcom_scm-64.c
> @@ -358,3 +358,28 @@ int __qcom_scm_pas_mss_reset(struct device *dev, bool 
> reset)
>  
>   return ret ? : res.a1;
>  }
> +
> +int __qcom_scm_assign_mem(struct device *dev, struct vmid_detail vmid)

Rather than packing these parameters up in a struct I think it's cleaner
to just pass them directly.

> +{
> + int ret;
> + struct qcom_scm_desc desc = {0};
> + struct arm_smccc_res res;
> +
> + desc.args[0] = vmid.fw_phy;
> + desc.args[1] = vmid.size_fw;
> + desc.args[2] = vmid.from_phy;
> + desc.args[3] = vmid.size_from;
> + desc.args[4] = vmid.to_phy;
> + desc.args[5] = vmid.size_to;
> + desc.args[6] = 0;
> +
> + desc.arginfo = QCOM_SCM_ARGS(7, QCOM_SCM_RO, QCOM_SCM_VAL,
> + QCOM_SCM_RO, QCOM_SCM_VAL, QCOM_SCM_RO,
> + QCOM_SCM_VAL, QCOM_SCM_VAL);
> +
> + ret = qcom_scm_call(dev, QCOM_SCM_SVC_MP,
> + QCOM_MEM_PROT_ASSIGN_ID,
> + , );
> +
> + return ret ? : res.a1;

If I understand the downstream code we only care about "ret" here; being
zero on success or negative on error.

> +}
> diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
> index 893f953ea..f137f34 100644
> --- a/drivers/firmware/qcom_scm.c
> +++ b/drivers/firmware/qcom_scm.c
> @@ -42,6 +42,18 @@ struct qcom_scm {
>  
>  static struct qcom_scm *__scm;
>  
> +struct dest_vm_and_perm_info {
> + __le32 vm;
> + __le32 perm;
> + __le32 *ctx;

Please be explicit about the fact that this is 64 bit.

> + __le32 ctx_size;
> +};

This should be __packed

> +
> +struct fw_region_info {
> + __le64 addr;
> + __le64 size;
> +};
> +
>  static int qcom_scm_clk_enable(void)
>  {
>   int ret;
> @@ -292,6 +304,87 @@ int qcom_scm_pas_shutdown(u32 peripheral)
>  }
>  EXPORT_SYMBOL(qcom_scm_pas_shutdown);
>  
> +/**
> + * qcom_scm_assign_mem() - Allocate and fill vmid detail of old
> + * new owners of memory region for fw and metadata etc, Which is
> + * further passed to hypervisor, which does translate intermediate
> + * physical address used by subsystems.
> + * @vmid: structure with pointers and size detail of old and new
> + * owners vmid detail.
> + * Return 0 on success.
> + */
> +int qcom_scm_assign_mem(struct vmid_detail vmid)

After a long discussion with Stephen I now think that I understand
what's actually going on here.

So this function will request TZ to remove all permissions for the
memory region in the tables specified in "from" and then for each vmid
in "to" it will set up the specified "permission".

So the "to" and "permissions" are actually a tuple, rather than
independent lists of values. So I think this should be exposed in the
prototype, as a list of  entries.

To make the function prototype more convenient I think you should turn
"from" into a bitmap (e.g. BIT(VMID_HLOS) | BIT(VMID_MSS_MSA)).

If you then make the function, on success, return "to" as a bitmap the
caller can simply store that in a state variable and pass it as "from"
in the next call.

So you would have:

  struct qcom_scm_mem_perm new_perms[] = {
{ VMID_HLOS, PERM_READ },   
{ VMID_MSS_MSA, PREM_READ | PERM_WRITE },
  };
  
  current_perms = qcom_scm_assign_mem(ptr, size, current_perms, new_perms, 2);


And I believe something like "curr_perm" and "new_perm" are even better
names than "from" and "to"

> +{
> + unsigned long dma_attrs = DMA_ATTR_FORCE_CONTIGUOUS;
> + struct dest_vm_and_perm_info *to;
> + struct fw_region_info *fw_info;
> + __le64 fw_phy;
> + __le32 *from;
> + int ret = -ENOMEM;
> + int i;
> +
> + from = dma_alloc_attrs(__scm->dev, vmid.size_from,
> + _phy, GFP_KERNEL, dma_attrs);
> + if (!from) {
> + dev_err(__scm->dev,
> + "failed to allocate buffer to pass source vmid 
> detail\n");
> + return -ENOMEM;
> + }
> + to = dma_alloc_attrs(__scm->dev, vmid.size_to,
> + _phy, GFP_KERNEL, dma_attrs);
> + if (!to) {
> + dev_err(__scm->dev,
> + "failed to allocate buffer to pass 

Re: [PATCH v5 3/5] drm/exynos: dsi: Fix the parse_dt function

2017-04-05 Thread Inki Dae


2017년 04월 05일 00:38에 Krzysztof Kozlowski 이(가) 쓴 글:
> On Tue, Mar 28, 2017 at 11:38 AM, Krzysztof Kozlowski  wrote:
>> On Tue, Mar 28, 2017 at 11:26 AM, Inki Dae  wrote:
>>> Merged.
>>
>> Hi,
>>
>> I do not see the tag (with DT patches) merged by you which I provided
>> to you before. These are essential for bisectability. Without them,
>> kernel bisectability is broken. Did you merged the tag somewhere?
>>
>> Best regards,
>> Krzysztof
>>
>>> Thanks,
>>> Inki Dae
> 
> Inki,
> 
> I still do not see the DTS tag [1] merged in your tree but you applied
> patches breaking the display. I looked at exynos-drm-next branch.
> 
> We talked already about bisectability and with Hoegeun we provided
> proper solution. Hoegeun split the patchset and I sent you a stable
> tag to merge. Be aware not to apply the DTS patch because you would
> effectively duplicate it. Instead, deal like with any pull request -
> merge the tag as dependency *before* applying DRM DSI patch.

Krzysztof,

I think merging the DTS tag is not necessary because dt and drm patches will go 
to -next separately.
Anyway, confirmed your email just now. Seems better so did what you want.


Thanks,
Inki Dae

> 
> I saw also a branch like this:
> https://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git/log/?h=exynos-drm-next-tm2
> but this is something obviously wrong. I do not know what are your
> plans to do with it, but please drop it as it brings only confusion.
> 
> Best regards,
> Krzysztof
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg567053.html
> 
>>> 2017년 03월 22일 10:36에 Hoegeun Kwon 이(가) 쓴 글:
 Hi inki,

 Could you check the this patch?
 For reference, patch 1/5 and 2/5 have already been applied to Krzysztof 
 tree.

 Best regards,
 Hoegeun

> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


Re: [PATCH v5 3/5] drm/exynos: dsi: Fix the parse_dt function

2017-04-05 Thread Inki Dae


2017년 04월 05일 00:38에 Krzysztof Kozlowski 이(가) 쓴 글:
> On Tue, Mar 28, 2017 at 11:38 AM, Krzysztof Kozlowski  wrote:
>> On Tue, Mar 28, 2017 at 11:26 AM, Inki Dae  wrote:
>>> Merged.
>>
>> Hi,
>>
>> I do not see the tag (with DT patches) merged by you which I provided
>> to you before. These are essential for bisectability. Without them,
>> kernel bisectability is broken. Did you merged the tag somewhere?
>>
>> Best regards,
>> Krzysztof
>>
>>> Thanks,
>>> Inki Dae
> 
> Inki,
> 
> I still do not see the DTS tag [1] merged in your tree but you applied
> patches breaking the display. I looked at exynos-drm-next branch.
> 
> We talked already about bisectability and with Hoegeun we provided
> proper solution. Hoegeun split the patchset and I sent you a stable
> tag to merge. Be aware not to apply the DTS patch because you would
> effectively duplicate it. Instead, deal like with any pull request -
> merge the tag as dependency *before* applying DRM DSI patch.

Krzysztof,

I think merging the DTS tag is not necessary because dt and drm patches will go 
to -next separately.
Anyway, confirmed your email just now. Seems better so did what you want.


Thanks,
Inki Dae

> 
> I saw also a branch like this:
> https://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git/log/?h=exynos-drm-next-tm2
> but this is something obviously wrong. I do not know what are your
> plans to do with it, but please drop it as it brings only confusion.
> 
> Best regards,
> Krzysztof
> 
> [1] https://www.spinics.net/lists/arm-kernel/msg567053.html
> 
>>> 2017년 03월 22일 10:36에 Hoegeun Kwon 이(가) 쓴 글:
 Hi inki,

 Could you check the this patch?
 For reference, patch 1/5 and 2/5 have already been applied to Krzysztof 
 tree.

 Best regards,
 Hoegeun

> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


Re: [RFC PATCH] Revert "mwifiex: fix system hang problem after resume"

2017-04-05 Thread amit karwar
On Sat, Apr 1, 2017 at 1:51 AM, Brian Norris  wrote:
> This reverts commit 437322ea2a36d112e20aa7282c869bf924b3a836.
>
> This above-mentioned "fix" does not actually do anything to prevent a
> race condition. It simply papers over it so that the issue doesn't
> appear.
>
> If this is a real problem, it should be explained better than the above
> commit does, and an alternative, non-racy solution should be found.
>
> For further reason to revert this: there's ot reason we can't try
> resetting the card when it's *actually* stuck in host-sleep mode. So
> instead, this is unnecessarily creating scenarios where we can't recover
> Wifi.
>
> Cc: Amitkumar Karwar 
> Signed-off-by: Brian Norris 
> ---
> Amit, please take a look. AIUI, your "fix" is wrong, and quite racy. If you
> still think it's needed, can you please propose an alternative? Or at least
> explain more why this is needed? Thanks.
>

I agree. Fix just covers the issue. We need to investigate why system
hangs when card reset is attempted in host sleep activated scenario.

Acked-by: Amitkumar Karwar 

Regards,
Amitkumar Karwar


Re: [RFC PATCH] Revert "mwifiex: fix system hang problem after resume"

2017-04-05 Thread amit karwar
On Sat, Apr 1, 2017 at 1:51 AM, Brian Norris  wrote:
> This reverts commit 437322ea2a36d112e20aa7282c869bf924b3a836.
>
> This above-mentioned "fix" does not actually do anything to prevent a
> race condition. It simply papers over it so that the issue doesn't
> appear.
>
> If this is a real problem, it should be explained better than the above
> commit does, and an alternative, non-racy solution should be found.
>
> For further reason to revert this: there's ot reason we can't try
> resetting the card when it's *actually* stuck in host-sleep mode. So
> instead, this is unnecessarily creating scenarios where we can't recover
> Wifi.
>
> Cc: Amitkumar Karwar 
> Signed-off-by: Brian Norris 
> ---
> Amit, please take a look. AIUI, your "fix" is wrong, and quite racy. If you
> still think it's needed, can you please propose an alternative? Or at least
> explain more why this is needed? Thanks.
>

I agree. Fix just covers the issue. We need to investigate why system
hangs when card reset is attempted in host sleep activated scenario.

Acked-by: Amitkumar Karwar 

Regards,
Amitkumar Karwar


Re: [PATCH 5/5] fpga-region: separate out common code from dt specific code

2017-04-05 Thread Moritz Fischer
Hi Alan,

first pass ... need to get back to it.

On Mon, Mar 13, 2017 at 04:53:33PM -0500, Alan Tull wrote:
> FPGA region is a layer above the FPGA manager and FPGA bridge
> frameworks.  Currently, FPGA region is dependent on device tree.
> This commit separates the device tree specific code from the
> common code, separating fpga-region.c into fpga-region.c,
> of-fpga-region.c, and fpga-region.h.
> 
> Functons exported from fpga-region.c:
> * fpga_region_register
> * fpga_region_unregister
>   Create/remove a FPGA region.  Caller will supply the region
>   struct initialized with a pointer to a FPGA manager and
>   a method to get the FPGA bridges.
> 
> * of_fpga_region_find
>   Find a fpga region, given the node pointer
> 
> * fpga_region_alloc_image_info
> * fpga_region_free_image_info
>   Alloc/free fpga_image_info struct
> 
> * fpga_region_program_fpga
>   Program an FPGA region
> 
> Signed-off-by: Alan Tull 
> ---
>  drivers/fpga/Kconfig  |  12 +-
>  drivers/fpga/Makefile |   1 +
>  drivers/fpga/fpga-region.c| 475 
> +++---
>  drivers/fpga/fpga-region.h|  50 +
>  drivers/fpga/of-fpga-region.c | 453 
>  include/linux/fpga/fpga-mgr.h |   6 +-
>  6 files changed, 599 insertions(+), 398 deletions(-)
>  create mode 100644 drivers/fpga/fpga-region.h
>  create mode 100644 drivers/fpga/of-fpga-region.c
> 
> diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> index ce861a2..be9c23d 100644
> --- a/drivers/fpga/Kconfig
> +++ b/drivers/fpga/Kconfig
> @@ -15,10 +15,18 @@ if FPGA
>  
>  config FPGA_REGION
>   tristate "FPGA Region"
> - depends on OF && FPGA_BRIDGE
> + depends on FPGA_BRIDGE
> + help
> +   FPGA Region common code.  A FPGA Region controls a FPGA Manager
> +   and the FPGA Bridges associated with either a reconfigurable
> +   region of an FPGA or a whole FPGA.
> +
> +config OF_FPGA_REGION
> + tristate "FPGA Region Device Tree Overlay Support"
> + depends on FPGA_REGION

Doesn't this one now need depends on FPGA_REGION & OF ? Since
FPGA_REGION no longer depends on OF, or does FPGA_BRIDGE pull it in?

>   help
> FPGA Regions allow loading FPGA images under control of
> -   the Device Tree.
> +   Device Tree Overlays.
>  
>  config FPGA_MGR_SOCFPGA
>   tristate "Altera SOCFPGA FPGA Manager"
> diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> index 8df07bc..fb88fb0 100644
> --- a/drivers/fpga/Makefile
> +++ b/drivers/fpga/Makefile
> @@ -17,3 +17,4 @@ obj-$(CONFIG_ALTERA_FREEZE_BRIDGE)  += 
> altera-freeze-bridge.o
>  
>  # High Level Interfaces
>  obj-$(CONFIG_FPGA_REGION)+= fpga-region.o
> +obj-$(CONFIG_OF_FPGA_REGION) += of-fpga-region.o
> diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c
> index 815f178..c06f2f7 100644
> --- a/drivers/fpga/fpga-region.c
> +++ b/drivers/fpga/fpga-region.c
> @@ -16,53 +16,64 @@
>   * this program.  If not, see .
>   */
>  
> +/* todo: prevent programming if region has child regions or overlay applied 
> */
> +
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
> +#include "fpga-region.h"
>  
> -/**
> - * struct fpga_region - FPGA Region structure
> - * @dev: FPGA Region device
> - * @mutex: enforces exclusive reference to region
> - * @bridge_list: list of FPGA bridges specified in region
> - * @info: fpga image specific information
> - */
> -struct fpga_region {
> - struct device dev;
> - struct mutex mutex; /* for exclusive reference to region */
> - struct list_head bridge_list;
> +static DEFINE_IDA(fpga_region_ida);
> +struct class *fpga_region_class;
> +
> +struct fpga_image_info *fpga_region_alloc_image_info(struct fpga_region 
> *region)
> +{
> + struct device *dev = >dev;
>   struct fpga_image_info *info;
> -};
>  
> -#define to_fpga_region(d) container_of(d, struct fpga_region, dev)
> + info = devm_kzalloc(dev, sizeof(*info), GFP_KERNEL);
> + if (!info)
> + return ERR_PTR(-ENOMEM);
>  
> -static DEFINE_IDA(fpga_region_ida);
> -static struct class *fpga_region_class;
> + return info;
> +}
> +EXPORT_SYMBOL_GPL(fpga_region_alloc_image_info);
>  
> -static const struct of_device_id fpga_region_of_match[] = {
> - { .compatible = "fpga-region", },
> - {},
> -};
> -MODULE_DEVICE_TABLE(of, fpga_region_of_match);
> +void fpga_region_free_image_info(struct fpga_region *region,
> +  struct fpga_image_info *info)
> +{
> + struct device *dev = >dev;
> +
> + if (!info)
> + return;
> +
> + if (info->firmware_name)
> + devm_kfree(dev, info->firmware_name);
>  
> + devm_kfree(dev, info);
> +}
> +EXPORT_SYMBOL_GPL(fpga_region_free_image_info);
> +
> +#if IS_ENABLED(CONFIG_OF_FPGA_REGION)
>  static int 

Re: [PATCH 5/5] fpga-region: separate out common code from dt specific code

2017-04-05 Thread Moritz Fischer
Hi Alan,

first pass ... need to get back to it.

On Mon, Mar 13, 2017 at 04:53:33PM -0500, Alan Tull wrote:
> FPGA region is a layer above the FPGA manager and FPGA bridge
> frameworks.  Currently, FPGA region is dependent on device tree.
> This commit separates the device tree specific code from the
> common code, separating fpga-region.c into fpga-region.c,
> of-fpga-region.c, and fpga-region.h.
> 
> Functons exported from fpga-region.c:
> * fpga_region_register
> * fpga_region_unregister
>   Create/remove a FPGA region.  Caller will supply the region
>   struct initialized with a pointer to a FPGA manager and
>   a method to get the FPGA bridges.
> 
> * of_fpga_region_find
>   Find a fpga region, given the node pointer
> 
> * fpga_region_alloc_image_info
> * fpga_region_free_image_info
>   Alloc/free fpga_image_info struct
> 
> * fpga_region_program_fpga
>   Program an FPGA region
> 
> Signed-off-by: Alan Tull 
> ---
>  drivers/fpga/Kconfig  |  12 +-
>  drivers/fpga/Makefile |   1 +
>  drivers/fpga/fpga-region.c| 475 
> +++---
>  drivers/fpga/fpga-region.h|  50 +
>  drivers/fpga/of-fpga-region.c | 453 
>  include/linux/fpga/fpga-mgr.h |   6 +-
>  6 files changed, 599 insertions(+), 398 deletions(-)
>  create mode 100644 drivers/fpga/fpga-region.h
>  create mode 100644 drivers/fpga/of-fpga-region.c
> 
> diff --git a/drivers/fpga/Kconfig b/drivers/fpga/Kconfig
> index ce861a2..be9c23d 100644
> --- a/drivers/fpga/Kconfig
> +++ b/drivers/fpga/Kconfig
> @@ -15,10 +15,18 @@ if FPGA
>  
>  config FPGA_REGION
>   tristate "FPGA Region"
> - depends on OF && FPGA_BRIDGE
> + depends on FPGA_BRIDGE
> + help
> +   FPGA Region common code.  A FPGA Region controls a FPGA Manager
> +   and the FPGA Bridges associated with either a reconfigurable
> +   region of an FPGA or a whole FPGA.
> +
> +config OF_FPGA_REGION
> + tristate "FPGA Region Device Tree Overlay Support"
> + depends on FPGA_REGION

Doesn't this one now need depends on FPGA_REGION & OF ? Since
FPGA_REGION no longer depends on OF, or does FPGA_BRIDGE pull it in?

>   help
> FPGA Regions allow loading FPGA images under control of
> -   the Device Tree.
> +   Device Tree Overlays.
>  
>  config FPGA_MGR_SOCFPGA
>   tristate "Altera SOCFPGA FPGA Manager"
> diff --git a/drivers/fpga/Makefile b/drivers/fpga/Makefile
> index 8df07bc..fb88fb0 100644
> --- a/drivers/fpga/Makefile
> +++ b/drivers/fpga/Makefile
> @@ -17,3 +17,4 @@ obj-$(CONFIG_ALTERA_FREEZE_BRIDGE)  += 
> altera-freeze-bridge.o
>  
>  # High Level Interfaces
>  obj-$(CONFIG_FPGA_REGION)+= fpga-region.o
> +obj-$(CONFIG_OF_FPGA_REGION) += of-fpga-region.o
> diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c
> index 815f178..c06f2f7 100644
> --- a/drivers/fpga/fpga-region.c
> +++ b/drivers/fpga/fpga-region.c
> @@ -16,53 +16,64 @@
>   * this program.  If not, see .
>   */
>  
> +/* todo: prevent programming if region has child regions or overlay applied 
> */
> +
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
> +#include "fpga-region.h"
>  
> -/**
> - * struct fpga_region - FPGA Region structure
> - * @dev: FPGA Region device
> - * @mutex: enforces exclusive reference to region
> - * @bridge_list: list of FPGA bridges specified in region
> - * @info: fpga image specific information
> - */
> -struct fpga_region {
> - struct device dev;
> - struct mutex mutex; /* for exclusive reference to region */
> - struct list_head bridge_list;
> +static DEFINE_IDA(fpga_region_ida);
> +struct class *fpga_region_class;
> +
> +struct fpga_image_info *fpga_region_alloc_image_info(struct fpga_region 
> *region)
> +{
> + struct device *dev = >dev;
>   struct fpga_image_info *info;
> -};
>  
> -#define to_fpga_region(d) container_of(d, struct fpga_region, dev)
> + info = devm_kzalloc(dev, sizeof(*info), GFP_KERNEL);
> + if (!info)
> + return ERR_PTR(-ENOMEM);
>  
> -static DEFINE_IDA(fpga_region_ida);
> -static struct class *fpga_region_class;
> + return info;
> +}
> +EXPORT_SYMBOL_GPL(fpga_region_alloc_image_info);
>  
> -static const struct of_device_id fpga_region_of_match[] = {
> - { .compatible = "fpga-region", },
> - {},
> -};
> -MODULE_DEVICE_TABLE(of, fpga_region_of_match);
> +void fpga_region_free_image_info(struct fpga_region *region,
> +  struct fpga_image_info *info)
> +{
> + struct device *dev = >dev;
> +
> + if (!info)
> + return;
> +
> + if (info->firmware_name)
> + devm_kfree(dev, info->firmware_name);
>  
> + devm_kfree(dev, info);
> +}
> +EXPORT_SYMBOL_GPL(fpga_region_free_image_info);
> +
> +#if IS_ENABLED(CONFIG_OF_FPGA_REGION)
>  static int fpga_region_of_node_match(struct device *dev, 

RE: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread KY Srinivasan


> -Original Message-
> From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> Sent: Wednesday, April 5, 2017 8:46 PM
> To: linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> ; ax...@kernel.dk
> Cc: Stephen Hemminger ; KY Srinivasan
> 
> Subject: Re: [PATCH] block-mq: set both block queue and hardware queue
> restart bit for restart
> 
> On Thu, 2017-04-06 at 03:38 +, Long Li wrote:
> > > -Original Message-
> > > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > >
> > > Please drop this patch. I'm working on a better solution.
> >
> > Thank you. Looking forward to your patch.
> 
> Hello Long,
> 
> It would help if you could share the name of the block or SCSI driver with
> which you ran into that lockup and also if you could share the name of the
> I/O scheduler used in your test.

The tests that indicated the issue were run Hyper-V. The driver is storvsc_drv.c
The I/O scheduler was I think noop.

K. Y
> 
> Thanks,
> 
> Bart.


RE: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread KY Srinivasan


> -Original Message-
> From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> Sent: Wednesday, April 5, 2017 8:46 PM
> To: linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> ; ax...@kernel.dk
> Cc: Stephen Hemminger ; KY Srinivasan
> 
> Subject: Re: [PATCH] block-mq: set both block queue and hardware queue
> restart bit for restart
> 
> On Thu, 2017-04-06 at 03:38 +, Long Li wrote:
> > > -Original Message-
> > > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > >
> > > Please drop this patch. I'm working on a better solution.
> >
> > Thank you. Looking forward to your patch.
> 
> Hello Long,
> 
> It would help if you could share the name of the block or SCSI driver with
> which you ran into that lockup and also if you could share the name of the
> I/O scheduler used in your test.

The tests that indicated the issue were run Hyper-V. The driver is storvsc_drv.c
The I/O scheduler was I think noop.

K. Y
> 
> Thanks,
> 
> Bart.


Re: [PATCH 3/3] perf tool, arm64, thunderx2: Add implementation defined events for ThunderX2

2017-04-05 Thread Ganapatrao Kulkarni
On Wed, Apr 5, 2017 at 3:35 PM, Mark Rutland  wrote:
> On Wed, Apr 05, 2017 at 02:42:39PM +0530, Ganapatrao Kulkarni wrote:
>> On Tue, Apr 4, 2017 at 5:58 PM, Mark Rutland  wrote:
>> > On Tue, Apr 04, 2017 at 01:06:43PM +0530, Ganapatrao Kulkarni wrote:
>> >> This is not a full event list, but a short list of useful events.
>> >>
>> >> Signed-off-by: Ganapatrao Kulkarni 
>> >> ---
>> >>  tools/perf/pmu-events/arch/arm64/mapfile.csv   |  2 +
>> >>  .../arm64/thunderx2/implementation-defined.json| 72 
>> >> ++
>> >>  2 files changed, 74 insertions(+)
>> >>  create mode 100644 tools/perf/pmu-events/arch/arm64/mapfile.csv
>> >>  create mode 100644 
>> >> tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json
>> >>
>> >> diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv 
>> >> b/tools/perf/pmu-events/arch/arm64/mapfile.csv
>> >> new file mode 100644
>> >> index 000..ba30e43
>> >> --- /dev/null
>> >> +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
>> >> @@ -0,0 +1,2 @@
>> >> +Family-model,Version,Filename,EventType
>> >> +0x420f5161,v1,thunderx2,core
>> >> diff --git 
>> >> a/tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json 
>> >> b/tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json
>> >> new file mode 100644
>> >> index 000..360e084
>> >> --- /dev/null
>> >> +++ 
>> >> b/tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json
>> >> @@ -0,0 +1,72 @@
>> >> +[
>> >> +{
>> >> +"PublicDescription": "Attributable Level 1 data cache access, 
>> >> read",
>> >> +"EventCode": "0x40",
>> >> +"EventName": "l1d_cache_access_read",
>> >> +"BriefDescription": "l1d cache access, read",
>> >> + "CPU" :"armv8_pmuv3_0"
>> >
>> > Please let's not hard-code the name like this. Surely we can get rid of 
>> > this?
>> >
>> > The kernel doesn't currently name PMUs as armv8_pmuv3_*, and as that can
>> > differ across DT/ACPI and in big.LITTLE, I don't think it makes sense to
>> > try to rely one particular string regardless.
>>
>> This string/name is fixed for a platform. having name here is essential to
>> know which devices among pmu (armv8_pmuv3_0, breakpoint, software)
>> devices, these jevents to be added.
>> also this json file is specific to a arch/soc/board, it is not a
>> generic file to be common.
>
> This file describe the events of a CPU PMU, and CPUs are not specific to
> a platform in general. There are many systems using Cortex-A57, for
> example.
>
> Across big.LITTLE SoCs with Cortex-A57, there's no guarantee as to
> whether the Cortex-A57 cores would be named armv8_pmuv3_0, or
> armv8_pmuv3_1, etc. This would depend on the boot CPU, probe order of
> secondaries, etc.

OK, we may not have complete name however, common part can be used to recognize
the PMU CORE devices from /sys/bus/event_source/devices
i.e we can have CPU id as "armv8_pmuv3".

same is extended to UNCORE as well.

mapfile.csv file will have entry for both BIG and LITTLE processors event files.
the jevents creates table of pmu_events_map for all entries present in
mapfile.csv file
while lookup, which ever pmu matches the cpuid of pmu_events_map
then corresponding table created from json file is used to add the
jevents to that PMU.

>
> I appreciate that your platform is homnogeneous, and you may not expect
> the core to be reused in any heterogeneous system. However, I think that
> if we're going to make this work for arm64 we should handle the general
> case, rather than only having it support a limited set of platforms.
>
> Currently, we don't have an "official" way of identifying which PMUs are
> CPU PMUs, but one way we could idtentify them would be to look at if
> they have a "cpus" attribute under sysfs (rather than a "cpumask"
> attribute).
>
> Thanks,
> Mark.

thanks
Ganapat


Re: [PATCH 3/3] perf tool, arm64, thunderx2: Add implementation defined events for ThunderX2

2017-04-05 Thread Ganapatrao Kulkarni
On Wed, Apr 5, 2017 at 3:35 PM, Mark Rutland  wrote:
> On Wed, Apr 05, 2017 at 02:42:39PM +0530, Ganapatrao Kulkarni wrote:
>> On Tue, Apr 4, 2017 at 5:58 PM, Mark Rutland  wrote:
>> > On Tue, Apr 04, 2017 at 01:06:43PM +0530, Ganapatrao Kulkarni wrote:
>> >> This is not a full event list, but a short list of useful events.
>> >>
>> >> Signed-off-by: Ganapatrao Kulkarni 
>> >> ---
>> >>  tools/perf/pmu-events/arch/arm64/mapfile.csv   |  2 +
>> >>  .../arm64/thunderx2/implementation-defined.json| 72 
>> >> ++
>> >>  2 files changed, 74 insertions(+)
>> >>  create mode 100644 tools/perf/pmu-events/arch/arm64/mapfile.csv
>> >>  create mode 100644 
>> >> tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json
>> >>
>> >> diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv 
>> >> b/tools/perf/pmu-events/arch/arm64/mapfile.csv
>> >> new file mode 100644
>> >> index 000..ba30e43
>> >> --- /dev/null
>> >> +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
>> >> @@ -0,0 +1,2 @@
>> >> +Family-model,Version,Filename,EventType
>> >> +0x420f5161,v1,thunderx2,core
>> >> diff --git 
>> >> a/tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json 
>> >> b/tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json
>> >> new file mode 100644
>> >> index 000..360e084
>> >> --- /dev/null
>> >> +++ 
>> >> b/tools/perf/pmu-events/arch/arm64/thunderx2/implementation-defined.json
>> >> @@ -0,0 +1,72 @@
>> >> +[
>> >> +{
>> >> +"PublicDescription": "Attributable Level 1 data cache access, 
>> >> read",
>> >> +"EventCode": "0x40",
>> >> +"EventName": "l1d_cache_access_read",
>> >> +"BriefDescription": "l1d cache access, read",
>> >> + "CPU" :"armv8_pmuv3_0"
>> >
>> > Please let's not hard-code the name like this. Surely we can get rid of 
>> > this?
>> >
>> > The kernel doesn't currently name PMUs as armv8_pmuv3_*, and as that can
>> > differ across DT/ACPI and in big.LITTLE, I don't think it makes sense to
>> > try to rely one particular string regardless.
>>
>> This string/name is fixed for a platform. having name here is essential to
>> know which devices among pmu (armv8_pmuv3_0, breakpoint, software)
>> devices, these jevents to be added.
>> also this json file is specific to a arch/soc/board, it is not a
>> generic file to be common.
>
> This file describe the events of a CPU PMU, and CPUs are not specific to
> a platform in general. There are many systems using Cortex-A57, for
> example.
>
> Across big.LITTLE SoCs with Cortex-A57, there's no guarantee as to
> whether the Cortex-A57 cores would be named armv8_pmuv3_0, or
> armv8_pmuv3_1, etc. This would depend on the boot CPU, probe order of
> secondaries, etc.

OK, we may not have complete name however, common part can be used to recognize
the PMU CORE devices from /sys/bus/event_source/devices
i.e we can have CPU id as "armv8_pmuv3".

same is extended to UNCORE as well.

mapfile.csv file will have entry for both BIG and LITTLE processors event files.
the jevents creates table of pmu_events_map for all entries present in
mapfile.csv file
while lookup, which ever pmu matches the cpuid of pmu_events_map
then corresponding table created from json file is used to add the
jevents to that PMU.

>
> I appreciate that your platform is homnogeneous, and you may not expect
> the core to be reused in any heterogeneous system. However, I think that
> if we're going to make this work for arm64 we should handle the general
> case, rather than only having it support a limited set of platforms.
>
> Currently, we don't have an "official" way of identifying which PMUs are
> CPU PMUs, but one way we could idtentify them would be to look at if
> they have a "cpus" attribute under sysfs (rather than a "cpumask"
> attribute).
>
> Thanks,
> Mark.

thanks
Ganapat


[PATCH] PM / devfreq: Move struct devfreq_governor to devfreq directory

2017-04-05 Thread Chanwoo Choi
This patch moves the struct devfreq_governor from header file
to the devfreq directory because this structure is private data
and it have to be only accessed by the devfreq core.

Signed-off-by: Chanwoo Choi 
---
 drivers/devfreq/governor.h | 29 +
 include/linux/devfreq.h| 30 +-
 2 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
index 71576b8bdfef..a4f2fa1091e4 100644
--- a/drivers/devfreq/governor.h
+++ b/drivers/devfreq/governor.h
@@ -25,6 +25,35 @@
 #define DEVFREQ_GOV_SUSPEND0x4
 #define DEVFREQ_GOV_RESUME 0x5
 
+/**
+ * struct devfreq_governor - Devfreq policy governor
+ * @node:  list node - contains registered devfreq governors
+ * @name:  Governor's name
+ * @immutable: Immutable flag for governor. If the value is 1,
+ * this govenror is never changeable to other governor.
+ * @get_target_freq:   Returns desired operating frequency for the device.
+ * Basically, get_target_freq will run
+ * devfreq_dev_profile.get_dev_status() to get the
+ * status of the device (load = busy_time / total_time).
+ * If no_central_polling is set, this callback is called
+ * only with update_devfreq() notified by OPP.
+ * @event_handler:  Callback for devfreq core framework to notify events
+ *  to governors. Events include per device governor
+ *  init and exit, opp changes out of devfreq, suspend
+ *  and resume of per device devfreq during device idle.
+ *
+ * Note that the callbacks are called with devfreq->lock locked by devfreq.
+ */
+struct devfreq_governor {
+   struct list_head node;
+
+   const char name[DEVFREQ_NAME_LEN];
+   const unsigned int immutable;
+   int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
+   int (*event_handler)(struct devfreq *devfreq,
+   unsigned int event, void *data);
+};
+
 /* Caution: devfreq->lock must be locked before calling update_devfreq */
 extern int update_devfreq(struct devfreq *devfreq);
 
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index e0acb0e5243b..6c220e4ebb6b 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -27,6 +27,7 @@
 #define DEVFREQ_POSTCHANGE (1)
 
 struct devfreq;
+struct devfreq_governor;
 
 /**
  * struct devfreq_dev_status - Data given from devfreq user device to
@@ -101,35 +102,6 @@ struct devfreq_dev_profile {
 };
 
 /**
- * struct devfreq_governor - Devfreq policy governor
- * @node:  list node - contains registered devfreq governors
- * @name:  Governor's name
- * @immutable: Immutable flag for governor. If the value is 1,
- * this govenror is never changeable to other governor.
- * @get_target_freq:   Returns desired operating frequency for the device.
- * Basically, get_target_freq will run
- * devfreq_dev_profile.get_dev_status() to get the
- * status of the device (load = busy_time / total_time).
- * If no_central_polling is set, this callback is called
- * only with update_devfreq() notified by OPP.
- * @event_handler:  Callback for devfreq core framework to notify events
- *  to governors. Events include per device governor
- *  init and exit, opp changes out of devfreq, suspend
- *  and resume of per device devfreq during device idle.
- *
- * Note that the callbacks are called with devfreq->lock locked by devfreq.
- */
-struct devfreq_governor {
-   struct list_head node;
-
-   const char name[DEVFREQ_NAME_LEN];
-   const unsigned int immutable;
-   int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
-   int (*event_handler)(struct devfreq *devfreq,
-   unsigned int event, void *data);
-};
-
-/**
  * struct devfreq - Device devfreq structure
  * @node:  list node - contains the devices with devfreq that have been
  * registered.
-- 
1.9.1



[PATCH] PM / devfreq: Move struct devfreq_governor to devfreq directory

2017-04-05 Thread Chanwoo Choi
This patch moves the struct devfreq_governor from header file
to the devfreq directory because this structure is private data
and it have to be only accessed by the devfreq core.

Signed-off-by: Chanwoo Choi 
---
 drivers/devfreq/governor.h | 29 +
 include/linux/devfreq.h| 30 +-
 2 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/drivers/devfreq/governor.h b/drivers/devfreq/governor.h
index 71576b8bdfef..a4f2fa1091e4 100644
--- a/drivers/devfreq/governor.h
+++ b/drivers/devfreq/governor.h
@@ -25,6 +25,35 @@
 #define DEVFREQ_GOV_SUSPEND0x4
 #define DEVFREQ_GOV_RESUME 0x5
 
+/**
+ * struct devfreq_governor - Devfreq policy governor
+ * @node:  list node - contains registered devfreq governors
+ * @name:  Governor's name
+ * @immutable: Immutable flag for governor. If the value is 1,
+ * this govenror is never changeable to other governor.
+ * @get_target_freq:   Returns desired operating frequency for the device.
+ * Basically, get_target_freq will run
+ * devfreq_dev_profile.get_dev_status() to get the
+ * status of the device (load = busy_time / total_time).
+ * If no_central_polling is set, this callback is called
+ * only with update_devfreq() notified by OPP.
+ * @event_handler:  Callback for devfreq core framework to notify events
+ *  to governors. Events include per device governor
+ *  init and exit, opp changes out of devfreq, suspend
+ *  and resume of per device devfreq during device idle.
+ *
+ * Note that the callbacks are called with devfreq->lock locked by devfreq.
+ */
+struct devfreq_governor {
+   struct list_head node;
+
+   const char name[DEVFREQ_NAME_LEN];
+   const unsigned int immutable;
+   int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
+   int (*event_handler)(struct devfreq *devfreq,
+   unsigned int event, void *data);
+};
+
 /* Caution: devfreq->lock must be locked before calling update_devfreq */
 extern int update_devfreq(struct devfreq *devfreq);
 
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index e0acb0e5243b..6c220e4ebb6b 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -27,6 +27,7 @@
 #define DEVFREQ_POSTCHANGE (1)
 
 struct devfreq;
+struct devfreq_governor;
 
 /**
  * struct devfreq_dev_status - Data given from devfreq user device to
@@ -101,35 +102,6 @@ struct devfreq_dev_profile {
 };
 
 /**
- * struct devfreq_governor - Devfreq policy governor
- * @node:  list node - contains registered devfreq governors
- * @name:  Governor's name
- * @immutable: Immutable flag for governor. If the value is 1,
- * this govenror is never changeable to other governor.
- * @get_target_freq:   Returns desired operating frequency for the device.
- * Basically, get_target_freq will run
- * devfreq_dev_profile.get_dev_status() to get the
- * status of the device (load = busy_time / total_time).
- * If no_central_polling is set, this callback is called
- * only with update_devfreq() notified by OPP.
- * @event_handler:  Callback for devfreq core framework to notify events
- *  to governors. Events include per device governor
- *  init and exit, opp changes out of devfreq, suspend
- *  and resume of per device devfreq during device idle.
- *
- * Note that the callbacks are called with devfreq->lock locked by devfreq.
- */
-struct devfreq_governor {
-   struct list_head node;
-
-   const char name[DEVFREQ_NAME_LEN];
-   const unsigned int immutable;
-   int (*get_target_freq)(struct devfreq *this, unsigned long *freq);
-   int (*event_handler)(struct devfreq *devfreq,
-   unsigned int event, void *data);
-};
-
-/**
  * struct devfreq - Device devfreq structure
  * @node:  list node - contains the devices with devfreq that have been
  * registered.
-- 
1.9.1



Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt

2017-04-05 Thread Paul E. McKenney
On Wed, Apr 05, 2017 at 10:12:24PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 10:59:25 -0700
> "Paul E. McKenney"  wrote:
> 
> > > > Could you please let me know if tracing happens in NMI handlers?
> > > > If so, a bit of additional code will be needed.
> > > > 
> > > > Thanx, Paul
> > > > 
> > > > PS.  Which reminds me, any short-term uses of RCU_TASKS?  This 
> > > > represents
> > > >  3 of my 16 test scenarios, which is getting hard to justify for
> > > >  something that isn't used.  Especially given that I will need to
> > > >  add more scenarios for parallel-callbacks SRCU...  
> > > 
> > > The RCU_TASK implementation is next on my todo list. Yes, there's going
> > > to be plenty of users very soon. Not for 4.12 but definitely for 4.13.
> > > 
> > > Sorry for the delay in implementing that :-/  
> > 
> > OK, I will wait a few months before checking again...
> > 
> 
> Actually, I took a quick look at what needs to be done, and I think it
> is *really* easy, and may be available in 4.12! Here's the current
> patch.

Cool!!!

> I can probably do a patch to allow optimized kprobes on PREEMPT kernels
> as well.
> 
> -- Steve
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 8efd9fe..28e3019 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, 
> int command)
>* callers are done before leaving this function.
>* The same goes for freeing the per_cpu data of the per_cpu
>* ops.
> -  *
> -  * Again, normal synchronize_sched() is not good enough.
> -  * We need to do a hard force of sched synchronization.
> -  * This is because we use preempt_disable() to do RCU, but
> -  * the function tracers can be called where RCU is not watching
> -  * (like before user_exit()). We can not rely on the RCU
> -  * infrastructure to do the synchronization, thus we must do it
> -  * ourselves.
>*/
>   if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) {
> + /*
> +  * We need to do a hard force of sched synchronization.
> +  * This is because we use preempt_disable() to do RCU, but
> +  * the function tracers can be called where RCU is not watching
> +  * (like before user_exit()). We can not rely on the RCU
> +  * infrastructure to do the synchronization, thus we must do it
> +  * ourselves.
> +  */
>   schedule_on_each_cpu(ftrace_sync);

Great header comment on ftrace_sync(): "Yes, function tracing is rude."
And schedule_on_each_cpu() looks like a great workqueue gatling gun!  ;-)

> +#ifdef CONFIG_PREEMPT
> + /*
> +  * When the kernel is preeptive, tasks can be preempted
> +  * while on a ftrace trampoline. Just scheduling a task on
> +  * a CPU is not good enough to flush them. Calling
> +  * synchronize_rcu_tasks() will wait for those tasks to
> +  * execute and either schedule voluntarily or enter user space.
> +  */
> + synchronize_rcu_tasks();
> +#endif

How about this to save a line?

if (IS_ENABLED(CONFIG_PREEMPT))
synchronize_rcu_tasks();

One thing that might speed this up a bit (or might not) would be to
doe the schedule_on_each_cpu() from a delayed workqueue.  That way,
if any of the activity from schedule_on_each_cpu() involved a voluntary
context switch (from a cond_resched() or some such), then
synchronize_rcu_tasks() would get the benefit of that context switch.

You would need a flush_work() to wait for that delayed workqueue
as well, of course.

Not sure whether it is worth it, but figured I should pass it along.

>   arch_ftrace_trampoline_free(ops);
> 
>   if (ops->flags & FTRACE_OPS_FL_PER_CPU)
> @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct 
> ftrace_ops *ops)
> 
>  static void ftrace_update_trampoline(struct ftrace_ops *ops)
>  {
> -
> -/*
> - * Currently there's no safe way to free a trampoline when the kernel
> - * is configured with PREEMPT. That is because a task could be preempted
> - * when it jumped to the trampoline, it may be preempted for a long time
> - * depending on the system load, and currently there's no way to know
> - * when it will be off the trampoline. If the trampoline is freed
> - * too early, when the task runs again, it will be executing on freed
> - * memory and crash.
> - */
> -#ifdef CONFIG_PREEMPT
> - /* Currently, only non dynamic ops can have a trampoline */
> - if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
> - return;
> -#endif
> -
>   arch_ftrace_update_trampoline(ops);
>  }

Agreed, straightforward patch!

Thanx, 

Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt

2017-04-05 Thread Paul E. McKenney
On Wed, Apr 05, 2017 at 10:12:24PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 10:59:25 -0700
> "Paul E. McKenney"  wrote:
> 
> > > > Could you please let me know if tracing happens in NMI handlers?
> > > > If so, a bit of additional code will be needed.
> > > > 
> > > > Thanx, Paul
> > > > 
> > > > PS.  Which reminds me, any short-term uses of RCU_TASKS?  This 
> > > > represents
> > > >  3 of my 16 test scenarios, which is getting hard to justify for
> > > >  something that isn't used.  Especially given that I will need to
> > > >  add more scenarios for parallel-callbacks SRCU...  
> > > 
> > > The RCU_TASK implementation is next on my todo list. Yes, there's going
> > > to be plenty of users very soon. Not for 4.12 but definitely for 4.13.
> > > 
> > > Sorry for the delay in implementing that :-/  
> > 
> > OK, I will wait a few months before checking again...
> > 
> 
> Actually, I took a quick look at what needs to be done, and I think it
> is *really* easy, and may be available in 4.12! Here's the current
> patch.

Cool!!!

> I can probably do a patch to allow optimized kprobes on PREEMPT kernels
> as well.
> 
> -- Steve
> 
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 8efd9fe..28e3019 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -2808,18 +2808,28 @@ static int ftrace_shutdown(struct ftrace_ops *ops, 
> int command)
>* callers are done before leaving this function.
>* The same goes for freeing the per_cpu data of the per_cpu
>* ops.
> -  *
> -  * Again, normal synchronize_sched() is not good enough.
> -  * We need to do a hard force of sched synchronization.
> -  * This is because we use preempt_disable() to do RCU, but
> -  * the function tracers can be called where RCU is not watching
> -  * (like before user_exit()). We can not rely on the RCU
> -  * infrastructure to do the synchronization, thus we must do it
> -  * ourselves.
>*/
>   if (ops->flags & (FTRACE_OPS_FL_DYNAMIC | FTRACE_OPS_FL_PER_CPU)) {
> + /*
> +  * We need to do a hard force of sched synchronization.
> +  * This is because we use preempt_disable() to do RCU, but
> +  * the function tracers can be called where RCU is not watching
> +  * (like before user_exit()). We can not rely on the RCU
> +  * infrastructure to do the synchronization, thus we must do it
> +  * ourselves.
> +  */
>   schedule_on_each_cpu(ftrace_sync);

Great header comment on ftrace_sync(): "Yes, function tracing is rude."
And schedule_on_each_cpu() looks like a great workqueue gatling gun!  ;-)

> +#ifdef CONFIG_PREEMPT
> + /*
> +  * When the kernel is preeptive, tasks can be preempted
> +  * while on a ftrace trampoline. Just scheduling a task on
> +  * a CPU is not good enough to flush them. Calling
> +  * synchronize_rcu_tasks() will wait for those tasks to
> +  * execute and either schedule voluntarily or enter user space.
> +  */
> + synchronize_rcu_tasks();
> +#endif

How about this to save a line?

if (IS_ENABLED(CONFIG_PREEMPT))
synchronize_rcu_tasks();

One thing that might speed this up a bit (or might not) would be to
doe the schedule_on_each_cpu() from a delayed workqueue.  That way,
if any of the activity from schedule_on_each_cpu() involved a voluntary
context switch (from a cond_resched() or some such), then
synchronize_rcu_tasks() would get the benefit of that context switch.

You would need a flush_work() to wait for that delayed workqueue
as well, of course.

Not sure whether it is worth it, but figured I should pass it along.

>   arch_ftrace_trampoline_free(ops);
> 
>   if (ops->flags & FTRACE_OPS_FL_PER_CPU)
> @@ -5366,22 +5376,6 @@ void __weak arch_ftrace_update_trampoline(struct 
> ftrace_ops *ops)
> 
>  static void ftrace_update_trampoline(struct ftrace_ops *ops)
>  {
> -
> -/*
> - * Currently there's no safe way to free a trampoline when the kernel
> - * is configured with PREEMPT. That is because a task could be preempted
> - * when it jumped to the trampoline, it may be preempted for a long time
> - * depending on the system load, and currently there's no way to know
> - * when it will be off the trampoline. If the trampoline is freed
> - * too early, when the task runs again, it will be executing on freed
> - * memory and crash.
> - */
> -#ifdef CONFIG_PREEMPT
> - /* Currently, only non dynamic ops can have a trampoline */
> - if (ops->flags & FTRACE_OPS_FL_DYNAMIC)
> - return;
> -#endif
> -
>   arch_ftrace_update_trampoline(ops);
>  }

Agreed, straightforward patch!

Thanx, Paul



Re: [PATCH] arm: dma: fix sharing of coherent DMA memory without struct page

2017-04-05 Thread kbuild test robot
Hi Shuah,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc5]
[cannot apply to next-20170405]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Shuah-Khan/arm-dma-fix-sharing-of-coherent-DMA-memory-without-struct-page/20170406-114717
config: x86_64-randconfig-x009-201714 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/file.h:8:0,
from include/linux/dma-buf.h:27,
from drivers/media/v4l2-core/videobuf2-dma-contig.c:13:
   drivers/media/v4l2-core/videobuf2-dma-contig.c: In function 'vb2_dc_alloc':
>> drivers/media/v4l2-core/videobuf2-dma-contig.c:164:6: error: implicit 
>> declaration of function 'dma_check_dev_coherent' 
>> [-Werror=implicit-function-declaration]
 if (dma_check_dev_coherent(dev, buf->dma_addr, buf->cookie))
 ^
   include/linux/compiler.h:160:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/media/v4l2-core/videobuf2-dma-contig.c:164:2: note: in expansion of 
>> macro 'if'
 if (dma_check_dev_coherent(dev, buf->dma_addr, buf->cookie))
 ^~
   cc1: some warnings being treated as errors

vim +/dma_check_dev_coherent +164 drivers/media/v4l2-core/videobuf2-dma-contig.c

 7   *
 8   * This program is free software; you can redistribute it and/or modify
 9   * it under the terms of the GNU General Public License as published by
10   * the Free Software Foundation.
11   */
12  
  > 13  #include 
14  #include 
15  #include 
16  #include 
17  #include 
18  #include 
19  
20  #include 
21  #include 
22  #include 
23  
24  struct vb2_dc_buf {
25  struct device   *dev;
26  void*vaddr;
27  unsigned long   size;
28  void*cookie;
29  dma_addr_t  dma_addr;
30  unsigned long   attrs;
31  enum dma_data_direction dma_dir;
32  struct sg_table *dma_sgt;
33  struct frame_vector *vec;
34  
35  /* MMAP related */
36  struct vb2_vmarea_handler   handler;
37  atomic_trefcount;
38  struct sg_table *sgt_base;
39  
40  /* DMABUF related */
41  struct dma_buf_attachment   *db_attach;
42  };
43  
44  /*/
45  /*scatterlist table functions*/
46  /*/
47  
48  static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt)
49  {
50  struct scatterlist *s;
51  dma_addr_t expected = sg_dma_address(sgt->sgl);
52  unsigned int i;
53  unsigned long size = 0;
54  
55  for_each_sg(sgt->sgl, s, sgt->nents, i) {
56  if (sg_dma_address(s) != expected)
57  break;
58  expected = sg_dma_address(s) + sg_dma_len(s);
59  size += sg_dma_len(s);
60  }
61  return size;
62  }
63  
64  /*/
65  /* callbacks for all buffers */
66  /*/
67  
68  static void *vb2_dc_cookie(void *buf_priv)
69  {
70  struct vb2_dc_buf *buf = buf_priv;
71  
72  return >dma_addr;
73  }
74  
75  static void *vb2_dc_vaddr(void *buf_priv)
76  {
77  struct vb2_dc_buf *buf = buf_priv;
78  
79  if (!buf->vaddr && buf->db_attach)
80  buf->vaddr = dma_buf_vmap(buf->db_attach->dmabuf);
81  
82  return buf->vaddr;
83  }
84  
85  static unsigned int vb2_dc_num_users(void *buf_priv)
86  {
87  struct vb2_dc_buf *buf = buf_priv;
88  
89  return atomic_read(>refcount);
90  }
91  
92  static void vb2_dc_prepare(void *buf_priv)
93  {
94  struct vb2_dc_buf *buf = buf_priv;
95  struct sg_table *sgt = buf->dma_sgt;
96  
97  /* DMABUF exporter will flush the cache for us */
98  if (!sgt || buf->db_attach)
99  return;
   100  
   101  dma_sync_sg_for_device(buf->d

Re: [PATCH] arm: dma: fix sharing of coherent DMA memory without struct page

2017-04-05 Thread kbuild test robot
Hi Shuah,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.11-rc5]
[cannot apply to next-20170405]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Shuah-Khan/arm-dma-fix-sharing-of-coherent-DMA-memory-without-struct-page/20170406-114717
config: x86_64-randconfig-x009-201714 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/file.h:8:0,
from include/linux/dma-buf.h:27,
from drivers/media/v4l2-core/videobuf2-dma-contig.c:13:
   drivers/media/v4l2-core/videobuf2-dma-contig.c: In function 'vb2_dc_alloc':
>> drivers/media/v4l2-core/videobuf2-dma-contig.c:164:6: error: implicit 
>> declaration of function 'dma_check_dev_coherent' 
>> [-Werror=implicit-function-declaration]
 if (dma_check_dev_coherent(dev, buf->dma_addr, buf->cookie))
 ^
   include/linux/compiler.h:160:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/media/v4l2-core/videobuf2-dma-contig.c:164:2: note: in expansion of 
>> macro 'if'
 if (dma_check_dev_coherent(dev, buf->dma_addr, buf->cookie))
 ^~
   cc1: some warnings being treated as errors

vim +/dma_check_dev_coherent +164 drivers/media/v4l2-core/videobuf2-dma-contig.c

 7   *
 8   * This program is free software; you can redistribute it and/or modify
 9   * it under the terms of the GNU General Public License as published by
10   * the Free Software Foundation.
11   */
12  
  > 13  #include 
14  #include 
15  #include 
16  #include 
17  #include 
18  #include 
19  
20  #include 
21  #include 
22  #include 
23  
24  struct vb2_dc_buf {
25  struct device   *dev;
26  void*vaddr;
27  unsigned long   size;
28  void*cookie;
29  dma_addr_t  dma_addr;
30  unsigned long   attrs;
31  enum dma_data_direction dma_dir;
32  struct sg_table *dma_sgt;
33  struct frame_vector *vec;
34  
35  /* MMAP related */
36  struct vb2_vmarea_handler   handler;
37  atomic_trefcount;
38  struct sg_table *sgt_base;
39  
40  /* DMABUF related */
41  struct dma_buf_attachment   *db_attach;
42  };
43  
44  /*/
45  /*scatterlist table functions*/
46  /*/
47  
48  static unsigned long vb2_dc_get_contiguous_size(struct sg_table *sgt)
49  {
50  struct scatterlist *s;
51  dma_addr_t expected = sg_dma_address(sgt->sgl);
52  unsigned int i;
53  unsigned long size = 0;
54  
55  for_each_sg(sgt->sgl, s, sgt->nents, i) {
56  if (sg_dma_address(s) != expected)
57  break;
58  expected = sg_dma_address(s) + sg_dma_len(s);
59  size += sg_dma_len(s);
60  }
61  return size;
62  }
63  
64  /*/
65  /* callbacks for all buffers */
66  /*/
67  
68  static void *vb2_dc_cookie(void *buf_priv)
69  {
70  struct vb2_dc_buf *buf = buf_priv;
71  
72  return >dma_addr;
73  }
74  
75  static void *vb2_dc_vaddr(void *buf_priv)
76  {
77  struct vb2_dc_buf *buf = buf_priv;
78  
79  if (!buf->vaddr && buf->db_attach)
80  buf->vaddr = dma_buf_vmap(buf->db_attach->dmabuf);
81  
82  return buf->vaddr;
83  }
84  
85  static unsigned int vb2_dc_num_users(void *buf_priv)
86  {
87  struct vb2_dc_buf *buf = buf_priv;
88  
89  return atomic_read(>refcount);
90  }
91  
92  static void vb2_dc_prepare(void *buf_priv)
93  {
94  struct vb2_dc_buf *buf = buf_priv;
95  struct sg_table *sgt = buf->dma_sgt;
96  
97  /* DMABUF exporter will flush the cache for us */
98  if (!sgt || buf->db_attach)
99  return;
   100  
   101  dma_sync_sg_for_device(buf->d

Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt

2017-04-05 Thread Paul E. McKenney
On Wed, Apr 05, 2017 at 09:31:45PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 13:42:29 -0700
> "Paul E. McKenney"  wrote:
> 
> > > OK, do you want me to send you a patch, or should I take your patch
> > > into my tree and add this?  
> > 
> > Given that I don't seem to have disable_stack_tracer() and
> > enable_stack_tracer(), could you please add them in?
> 
> That's what I meant about "send you a patch" :-)

Reading/comprehension-challenged today, aren't I?

This doesn't conflict with anything in my tree, so either your tree
or my tree is fine by me.  The advantage of your tree is that you
could easily sync it up with the other changes needed.

Thanx, Paul



Re: [BUG] stack tracing causes: kernel/module.c:271 module_assert_mutex_or_preempt

2017-04-05 Thread Paul E. McKenney
On Wed, Apr 05, 2017 at 09:31:45PM -0400, Steven Rostedt wrote:
> On Wed, 5 Apr 2017 13:42:29 -0700
> "Paul E. McKenney"  wrote:
> 
> > > OK, do you want me to send you a patch, or should I take your patch
> > > into my tree and add this?  
> > 
> > Given that I don't seem to have disable_stack_tracer() and
> > enable_stack_tracer(), could you please add them in?
> 
> That's what I meant about "send you a patch" :-)

Reading/comprehension-challenged today, aren't I?

This doesn't conflict with anything in my tree, so either your tree
or my tree is fine by me.  The advantage of your tree is that you
could easily sync it up with the other changes needed.

Thanx, Paul



Re: [PATCH 4/5] fpga-mgr: separate getting/locking FPGA manager

2017-04-05 Thread Moritz Fischer
Hi Alan,

minor nits, inline

On Mon, Mar 13, 2017 at 04:53:32PM -0500, Alan Tull wrote:
> Add fpga_mgr_lock/unlock functions that get a mutex for
> exclusive use.
> 
> of_fpga_mgr_get, fpga_mgr_get, and fpga_mgr_put no longer lock
> the FPGA manager mutex.
> 
> This makes it more straightforward to save a reference to
> a FPGA manager and only attempting to lock it when programming
> the FPGA.
> 
> Signed-off-by: Alan Tull 
> ---
>  drivers/fpga/fpga-mgr.c   | 44 
> +++
>  drivers/fpga/fpga-region.c| 13 +++--
>  include/linux/fpga/fpga-mgr.h |  3 +++
>  3 files changed, 46 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
> index fde605b..f7c3648 100644
> --- a/drivers/fpga/fpga-mgr.c
> +++ b/drivers/fpga/fpga-mgr.c
> @@ -376,28 +376,19 @@ ATTRIBUTE_GROUPS(fpga_mgr);
>  struct fpga_manager *__fpga_mgr_get(struct device *dev)
>  {
>   struct fpga_manager *mgr;
> - int ret = -ENODEV;
>  
>   mgr = to_fpga_manager(dev);
>   if (!mgr)
>   goto err_dev;
>  
> - /* Get exclusive use of fpga manager */
> - if (!mutex_trylock(>ref_mutex)) {
> - ret = -EBUSY;
> - goto err_dev;
> - }
> -
>   if (!try_module_get(dev->parent->driver->owner))
> - goto err_ll_mod;
> + goto err_dev;
>  
>   return mgr;
>  
> -err_ll_mod:
> - mutex_unlock(>ref_mutex);
>  err_dev:
>   put_device(dev);
> - return ERR_PTR(ret);
> + return ERR_PTR(-ENODEV);
>  }
>  
>  static int fpga_mgr_dev_match(struct device *dev, const void *data)
> @@ -457,12 +448,41 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>  void fpga_mgr_put(struct fpga_manager *mgr)
>  {
>   module_put(mgr->dev.parent->driver->owner);
> - mutex_unlock(>ref_mutex);
>   put_device(>dev);
>  }
>  EXPORT_SYMBOL_GPL(fpga_mgr_put);
>  
>  /**
> + * fpga_mgr_lock - Lock FPGA manager for exclusive use
> + * @mgr: fpga manager
> + *
> + * Given a pointer to FPGA Manager (from fpga_mgr_get() or
> + * of_fpga_mgr_put()) attempt to get the mutex.
> + *
> + * Return: 0 for success or -EBUSY
> + */
> +int fpga_mgr_lock(struct fpga_manager *mgr)
> +{
> + if (!mutex_trylock(>ref_mutex)) {
> + dev_err(>dev, "FPGA manager is in use.\n");
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(fpga_mgr_lock);
> +
> +/**
> + * fpga_mgr_unlock - Unlock FPGA manager
> + * @mgr: fpga manager
> + */
> +void fpga_mgr_unlock(struct fpga_manager *mgr)
> +{
> + mutex_unlock(>ref_mutex);
> +}
> +EXPORT_SYMBOL_GPL(fpga_mgr_unlock);
> +
> +/**
>   * fpga_mgr_register - register a low level fpga manager driver
>   * @dev: fpga manager device from pdev
>   * @name:fpga manager name
> diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c
> index 294556e..815f178 100644
> --- a/drivers/fpga/fpga-region.c
> +++ b/drivers/fpga/fpga-region.c
> @@ -125,7 +125,7 @@ static void fpga_region_put(struct fpga_region *region)
>  }
>  
>  /**
> - * fpga_region_get_manager - get exclusive reference for FPGA manager
> + * fpga_region_get_manager - get reference for FPGA manager
>   * @region: FPGA region
>   *
>   * Get FPGA Manager from "fpga-mgr" property or from ancestor region.
> @@ -245,10 +245,16 @@ static int fpga_region_program_fpga(struct fpga_region 
> *region,
>   return PTR_ERR(mgr);
>   }
>  
> + ret = fpga_mgr_lock(mgr);
> + if (ret) {
> + pr_err("FPGA manager is busy\n");

Am I missing something here, or could you use dev_err(>dev, ...)
here?

> + goto err_put_mgr;
> + }
> +
>   ret = fpga_region_get_bridges(region, overlay);
>   if (ret) {
>   pr_err("failed to get fpga region bridges\n");

Same here, (I know this is not part of this patch), maybe the above is
for consistency reasons then. Maybe I'm missing something.
> - goto err_put_mgr;
> + goto err_unlock_mgr;
>   }
>  
>   ret = fpga_bridges_disable(>bridge_list);
> @@ -269,6 +275,7 @@ static int fpga_region_program_fpga(struct fpga_region 
> *region,
>   goto err_put_br;
>   }
>  
> + fpga_mgr_unlock(mgr);
>   fpga_mgr_put(mgr);
>   fpga_region_put(region);
>  
> @@ -276,6 +283,8 @@ static int fpga_region_program_fpga(struct fpga_region 
> *region,
>  
>  err_put_br:
>   fpga_bridges_put(>bridge_list);
> +err_unlock_mgr:
> + fpga_mgr_unlock(mgr);
>  err_put_mgr:
>   fpga_mgr_put(mgr);
>   fpga_region_put(region);
> diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h
> index 45df05a..ae970ca 100644
> --- a/include/linux/fpga/fpga-mgr.h
> +++ b/include/linux/fpga/fpga-mgr.h
> @@ -149,6 +149,9 @@ int fpga_mgr_firmware_load(struct fpga_manager *mgr,
>  
>  int fpga_mgr_load(struct fpga_manager *mgr, struct fpga_image_info *info);
>  
> +int fpga_mgr_lock(struct 

Re: [PATCH 4/5] fpga-mgr: separate getting/locking FPGA manager

2017-04-05 Thread Moritz Fischer
Hi Alan,

minor nits, inline

On Mon, Mar 13, 2017 at 04:53:32PM -0500, Alan Tull wrote:
> Add fpga_mgr_lock/unlock functions that get a mutex for
> exclusive use.
> 
> of_fpga_mgr_get, fpga_mgr_get, and fpga_mgr_put no longer lock
> the FPGA manager mutex.
> 
> This makes it more straightforward to save a reference to
> a FPGA manager and only attempting to lock it when programming
> the FPGA.
> 
> Signed-off-by: Alan Tull 
> ---
>  drivers/fpga/fpga-mgr.c   | 44 
> +++
>  drivers/fpga/fpga-region.c| 13 +++--
>  include/linux/fpga/fpga-mgr.h |  3 +++
>  3 files changed, 46 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/fpga/fpga-mgr.c b/drivers/fpga/fpga-mgr.c
> index fde605b..f7c3648 100644
> --- a/drivers/fpga/fpga-mgr.c
> +++ b/drivers/fpga/fpga-mgr.c
> @@ -376,28 +376,19 @@ ATTRIBUTE_GROUPS(fpga_mgr);
>  struct fpga_manager *__fpga_mgr_get(struct device *dev)
>  {
>   struct fpga_manager *mgr;
> - int ret = -ENODEV;
>  
>   mgr = to_fpga_manager(dev);
>   if (!mgr)
>   goto err_dev;
>  
> - /* Get exclusive use of fpga manager */
> - if (!mutex_trylock(>ref_mutex)) {
> - ret = -EBUSY;
> - goto err_dev;
> - }
> -
>   if (!try_module_get(dev->parent->driver->owner))
> - goto err_ll_mod;
> + goto err_dev;
>  
>   return mgr;
>  
> -err_ll_mod:
> - mutex_unlock(>ref_mutex);
>  err_dev:
>   put_device(dev);
> - return ERR_PTR(ret);
> + return ERR_PTR(-ENODEV);
>  }
>  
>  static int fpga_mgr_dev_match(struct device *dev, const void *data)
> @@ -457,12 +448,41 @@ EXPORT_SYMBOL_GPL(of_fpga_mgr_get);
>  void fpga_mgr_put(struct fpga_manager *mgr)
>  {
>   module_put(mgr->dev.parent->driver->owner);
> - mutex_unlock(>ref_mutex);
>   put_device(>dev);
>  }
>  EXPORT_SYMBOL_GPL(fpga_mgr_put);
>  
>  /**
> + * fpga_mgr_lock - Lock FPGA manager for exclusive use
> + * @mgr: fpga manager
> + *
> + * Given a pointer to FPGA Manager (from fpga_mgr_get() or
> + * of_fpga_mgr_put()) attempt to get the mutex.
> + *
> + * Return: 0 for success or -EBUSY
> + */
> +int fpga_mgr_lock(struct fpga_manager *mgr)
> +{
> + if (!mutex_trylock(>ref_mutex)) {
> + dev_err(>dev, "FPGA manager is in use.\n");
> + return -EBUSY;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(fpga_mgr_lock);
> +
> +/**
> + * fpga_mgr_unlock - Unlock FPGA manager
> + * @mgr: fpga manager
> + */
> +void fpga_mgr_unlock(struct fpga_manager *mgr)
> +{
> + mutex_unlock(>ref_mutex);
> +}
> +EXPORT_SYMBOL_GPL(fpga_mgr_unlock);
> +
> +/**
>   * fpga_mgr_register - register a low level fpga manager driver
>   * @dev: fpga manager device from pdev
>   * @name:fpga manager name
> diff --git a/drivers/fpga/fpga-region.c b/drivers/fpga/fpga-region.c
> index 294556e..815f178 100644
> --- a/drivers/fpga/fpga-region.c
> +++ b/drivers/fpga/fpga-region.c
> @@ -125,7 +125,7 @@ static void fpga_region_put(struct fpga_region *region)
>  }
>  
>  /**
> - * fpga_region_get_manager - get exclusive reference for FPGA manager
> + * fpga_region_get_manager - get reference for FPGA manager
>   * @region: FPGA region
>   *
>   * Get FPGA Manager from "fpga-mgr" property or from ancestor region.
> @@ -245,10 +245,16 @@ static int fpga_region_program_fpga(struct fpga_region 
> *region,
>   return PTR_ERR(mgr);
>   }
>  
> + ret = fpga_mgr_lock(mgr);
> + if (ret) {
> + pr_err("FPGA manager is busy\n");

Am I missing something here, or could you use dev_err(>dev, ...)
here?

> + goto err_put_mgr;
> + }
> +
>   ret = fpga_region_get_bridges(region, overlay);
>   if (ret) {
>   pr_err("failed to get fpga region bridges\n");

Same here, (I know this is not part of this patch), maybe the above is
for consistency reasons then. Maybe I'm missing something.
> - goto err_put_mgr;
> + goto err_unlock_mgr;
>   }
>  
>   ret = fpga_bridges_disable(>bridge_list);
> @@ -269,6 +275,7 @@ static int fpga_region_program_fpga(struct fpga_region 
> *region,
>   goto err_put_br;
>   }
>  
> + fpga_mgr_unlock(mgr);
>   fpga_mgr_put(mgr);
>   fpga_region_put(region);
>  
> @@ -276,6 +283,8 @@ static int fpga_region_program_fpga(struct fpga_region 
> *region,
>  
>  err_put_br:
>   fpga_bridges_put(>bridge_list);
> +err_unlock_mgr:
> + fpga_mgr_unlock(mgr);
>  err_put_mgr:
>   fpga_mgr_put(mgr);
>   fpga_region_put(region);
> diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h
> index 45df05a..ae970ca 100644
> --- a/include/linux/fpga/fpga-mgr.h
> +++ b/include/linux/fpga/fpga-mgr.h
> @@ -149,6 +149,9 @@ int fpga_mgr_firmware_load(struct fpga_manager *mgr,
>  
>  int fpga_mgr_load(struct fpga_manager *mgr, struct fpga_image_info *info);
>  
> +int fpga_mgr_lock(struct fpga_manager *mgr);
> +void 

linux-next: build warning after merge of the scsi tree

2017-04-05 Thread Stephen Rothwell
Hi James,

After merging the scsi tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

In file included from include/linux/list.h:8:0,
 from include/linux/module.h:9,
 from drivers/scsi/sd.c:35:
drivers/scsi/sd.c: In function 'sd_revalidate_disk':
include/linux/kernel.h:755:16: warning: comparison of distinct pointer types 
lacks a cast
  (void) ( == );   \
^
include/linux/kernel.h:758:2: note: in expansion of macro '__min'
  __min(typeof(x), typeof(y),   \
  ^
include/linux/kernel.h:783:39: note: in expansion of macro 'min'
  __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
   ^
drivers/scsi/sd.c:2959:12: note: in expansion of macro 'min_not_zero'
   rw_max = min_not_zero(logical_to_sectors(sdp, dev_max),
^

Introduced by commit

  c3e62673ee20 ("scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is 
unusable")

logical_to_sectors() is a sector_t and BLK_DEF_MAX_SECTORS is an "enum
blk_default_limits" (i.e. int).

-- 
Cheers,
Stephen Rothwell


linux-next: build warning after merge of the scsi tree

2017-04-05 Thread Stephen Rothwell
Hi James,

After merging the scsi tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

In file included from include/linux/list.h:8:0,
 from include/linux/module.h:9,
 from drivers/scsi/sd.c:35:
drivers/scsi/sd.c: In function 'sd_revalidate_disk':
include/linux/kernel.h:755:16: warning: comparison of distinct pointer types 
lacks a cast
  (void) ( == );   \
^
include/linux/kernel.h:758:2: note: in expansion of macro '__min'
  __min(typeof(x), typeof(y),   \
  ^
include/linux/kernel.h:783:39: note: in expansion of macro 'min'
  __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
   ^
drivers/scsi/sd.c:2959:12: note: in expansion of macro 'min_not_zero'
   rw_max = min_not_zero(logical_to_sectors(sdp, dev_max),
^

Introduced by commit

  c3e62673ee20 ("scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is 
unusable")

logical_to_sectors() is a sector_t and BLK_DEF_MAX_SECTORS is an "enum
blk_default_limits" (i.e. int).

-- 
Cheers,
Stephen Rothwell


Re: [PATCH] mwifiex: MAC randomization should not be persistent

2017-04-05 Thread Kalle Valo
Brian Norris  writes:

> nl80211 provides the NL80211_SCAN_FLAG_RANDOM_ADDR for every scan
> request that should be randomized; the absence of such a flag means we
> should not randomize. However, mwifiex was stashing the latest
> randomization request and *always* using it for future scans, even those
> that didn't set the flag.
>
> Let's zero out the randomization info whenever we get a scan request
> without NL80211_SCAN_FLAG_RANDOM_ADDR. I'd prefer to remove
> priv->random_mac entirely (and plumb the randomization MAC properly
> through the call sequence), but the spaghetti is a little difficult to
> unravel here for me.
>
> Fixes: c2a8f0ff9c6c ("mwifiex: support random MAC address for scanning")

So the first release with this was v4.9.

> Signed-off-by: Brian Norris 
> ---
> Should this be tagged for -stable?

IMHO yes.

-- 
Kalle Valo


Re: [PATCH] mwifiex: MAC randomization should not be persistent

2017-04-05 Thread Kalle Valo
Brian Norris  writes:

> nl80211 provides the NL80211_SCAN_FLAG_RANDOM_ADDR for every scan
> request that should be randomized; the absence of such a flag means we
> should not randomize. However, mwifiex was stashing the latest
> randomization request and *always* using it for future scans, even those
> that didn't set the flag.
>
> Let's zero out the randomization info whenever we get a scan request
> without NL80211_SCAN_FLAG_RANDOM_ADDR. I'd prefer to remove
> priv->random_mac entirely (and plumb the randomization MAC properly
> through the call sequence), but the spaghetti is a little difficult to
> unravel here for me.
>
> Fixes: c2a8f0ff9c6c ("mwifiex: support random MAC address for scanning")

So the first release with this was v4.9.

> Signed-off-by: Brian Norris 
> ---
> Should this be tagged for -stable?

IMHO yes.

-- 
Kalle Valo


[PATCH] ARM: domains: Extract common USER domain init

2017-04-05 Thread Kees Cook
Everything but the USER domain is the same with CONFIG_CPU_SW_DOMAIN_PAN
or not. This extracts the differences for a common DACR_INIT macro so it
is easier to make future changes (like adding the WR_RARE domain).

Signed-off-by: Kees Cook 
---
 arch/arm/include/asm/domain.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index 99d9f630d6b6..8b33bd7f6bf9 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -59,18 +59,18 @@
 #define domain_val(dom,type)   ((type) << (2 * (dom)))
 
 #ifdef CONFIG_CPU_SW_DOMAIN_PAN
-#define DACR_INIT \
-   (domain_val(DOMAIN_USER, DOMAIN_NOACCESS) | \
-domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
-domain_val(DOMAIN_IO, DOMAIN_CLIENT) | \
-domain_val(DOMAIN_VECTORS, DOMAIN_CLIENT))
+#define __DACR_INIT_USER \
+   domain_val(DOMAIN_USER, DOMAIN_NOACCESS)
 #else
+#define __DACR_INIT_USER \
+   domain_val(DOMAIN_USER, DOMAIN_CLIENT)
+#endif
+
 #define DACR_INIT \
-   (domain_val(DOMAIN_USER, DOMAIN_CLIENT) | \
+   (__DACR_INIT_USER | \
 domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
 domain_val(DOMAIN_IO, DOMAIN_CLIENT) | \
 domain_val(DOMAIN_VECTORS, DOMAIN_CLIENT))
-#endif
 
 #define __DACR_DEFAULT \
domain_val(DOMAIN_KERNEL, DOMAIN_CLIENT) | \
-- 
2.7.4


-- 
Kees Cook
Pixel Security


[PATCH] ARM: domains: Extract common USER domain init

2017-04-05 Thread Kees Cook
Everything but the USER domain is the same with CONFIG_CPU_SW_DOMAIN_PAN
or not. This extracts the differences for a common DACR_INIT macro so it
is easier to make future changes (like adding the WR_RARE domain).

Signed-off-by: Kees Cook 
---
 arch/arm/include/asm/domain.h | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h
index 99d9f630d6b6..8b33bd7f6bf9 100644
--- a/arch/arm/include/asm/domain.h
+++ b/arch/arm/include/asm/domain.h
@@ -59,18 +59,18 @@
 #define domain_val(dom,type)   ((type) << (2 * (dom)))
 
 #ifdef CONFIG_CPU_SW_DOMAIN_PAN
-#define DACR_INIT \
-   (domain_val(DOMAIN_USER, DOMAIN_NOACCESS) | \
-domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
-domain_val(DOMAIN_IO, DOMAIN_CLIENT) | \
-domain_val(DOMAIN_VECTORS, DOMAIN_CLIENT))
+#define __DACR_INIT_USER \
+   domain_val(DOMAIN_USER, DOMAIN_NOACCESS)
 #else
+#define __DACR_INIT_USER \
+   domain_val(DOMAIN_USER, DOMAIN_CLIENT)
+#endif
+
 #define DACR_INIT \
-   (domain_val(DOMAIN_USER, DOMAIN_CLIENT) | \
+   (__DACR_INIT_USER | \
 domain_val(DOMAIN_KERNEL, DOMAIN_MANAGER) | \
 domain_val(DOMAIN_IO, DOMAIN_CLIENT) | \
 domain_val(DOMAIN_VECTORS, DOMAIN_CLIENT))
-#endif
 
 #define __DACR_DEFAULT \
domain_val(DOMAIN_KERNEL, DOMAIN_CLIENT) | \
-- 
2.7.4


-- 
Kees Cook
Pixel Security


[PATCH] mailmap: update Yakir Yang email address

2017-04-05 Thread Jeffy Chen
Set current email address to replace previous employers email
addresses.

Signed-off-by: Jeffy Chen 
---

 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index e775f79..de0fc5b 100644
--- a/.mailmap
+++ b/.mailmap
@@ -172,6 +172,7 @@ Vlad Dogaru  
 Vladimir Davydov  
 Vladimir Davydov  
 Takashi YOSHII 
+Yakir Yang  
 Yusuke Goda 
 Gustavo Padovan 
 Gustavo Padovan 
-- 
2.1.4




[PATCH] mailmap: update Yakir Yang email address

2017-04-05 Thread Jeffy Chen
Set current email address to replace previous employers email
addresses.

Signed-off-by: Jeffy Chen 
---

 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.mailmap b/.mailmap
index e775f79..de0fc5b 100644
--- a/.mailmap
+++ b/.mailmap
@@ -172,6 +172,7 @@ Vlad Dogaru  
 Vladimir Davydov  
 Vladimir Davydov  
 Takashi YOSHII 
+Yakir Yang  
 Yusuke Goda 
 Gustavo Padovan 
 Gustavo Padovan 
-- 
2.1.4




Re: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread Bart Van Assche
On Thu, 2017-04-06 at 03:38 +, Long Li wrote:
> > -Original Message-
> > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > 
> > Please drop this patch. I'm working on a better solution.
> 
> Thank you. Looking forward to your patch.

Hello Long,

It would help if you could share the name of the block or SCSI driver with
which you ran into that lockup and also if you could share the name of the
I/O scheduler used in your test.

Thanks,

Bart.

Re: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread Bart Van Assche
On Thu, 2017-04-06 at 03:38 +, Long Li wrote:
> > -Original Message-
> > From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> > 
> > Please drop this patch. I'm working on a better solution.
> 
> Thank you. Looking forward to your patch.

Hello Long,

It would help if you could share the name of the block or SCSI driver with
which you ran into that lockup and also if you could share the name of the
I/O scheduler used in your test.

Thanks,

Bart.

Re: [PATCH v3] Allow user probes on versioned symbols

2017-04-05 Thread Paul Clarke

On 04/04/2017 09:26 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Apr 04, 2017 at 11:18:02PM +0900, Masami Hiramatsu escreveu:

On Mon, 3 Apr 2017 11:46:58 -0300
Arnaldo Carvalho de Melo  wrote:
> > > But apart from those problems, I think that one should be able to ask
> > > for a versioned symbol, to probe just apps using that specific version,
>
> > I agree, but wasn't trying to tackle that at the moment.  I can look into 
it, though.
>
> > > for instance, we should consider the whole name as two functions, which
> > > in fact, they are, no?
>
> > I'm not sure I understand what you mean here.  Do you mean we should set a 
probe at every version of a given symbol name?  For example, if there are symbols:
> > a@@V2
> > a@V1.1
> > a@V1
>
> > ...for a request to set a probe at "a", we'd actually set a probe at all 3?
>
> I think that we should just probe the default for that symbol and have a
> way to probe all of them, perhaps using the wildcard, i.e.:
>
> [root@jouet linux]# nm /lib64/libpthread-2.24.so  | grep ' 
pthread_cond_timedwait'
> dd90 T pthread_cond_timedwait@GLIBC_2.2.5
> d6e0 T pthread_cond_timedwait@@GLIBC_2.3.2
> [root@jouet linux]#
>
>   # perf probe -x /lib64/libpthread-2.24.so pthread_cond_timedwait
>
> should be equivalent to:
>
>   # perf probe -x /lib64/libpthread-2.24.so 
pthread_cond_timedwait@@GLIBC_2.3.2
>
> Which matches how these versioned symbols are resolved by the linker,
> no?
>
> I.e. when 'pthread_cond_timedwait' is specified and the symbol table
> lookup fails, I think we should re-lookup for
> 'pthread_cond_timedwait@@*', i.e. we should have a
> symbol__find_default_by_name(), which will take the
> "pthread_cond_timedwait" and use a symbol comparison using
> strncmp(strlen(key)), matching, should then look at right after the
> common part looking for the double @@.



Hm, this 'fallback'process sounds good idea to me.


I just sent a patch that does what you suggest, above.  To avoid duplicating 
the code in symbols_find_by_name, I added a parameter to tell it whether to 
ignore default symbol tags or not.


This is just trying to keep the semantics used by the original user of
this syntax, i.e. the linker.


BTW, how would we support other SYMBOL@VERSION, since we already
use '@' for specifying source code?
One possible way is to support it directly in perf-probe. If it
failed to find probe point from dwarf, try to find from symbol
map by using '@VERSION' suffix.


Right, we would be overloading that @ symbol, since version numbers
usually are very different of file source names :-)


There's not a lot of syntactic difference between "file" and "tag".  I don't think there 
is any standard for what either can be.  One might expect a "file" to be name.extension, where 
extension is a finite set (possibly fairly large).  A tag can be almost anything, I think.  One might expect 
it to end with a number.  I don't think there's a guarantee of either case, though.

It would seem one way to determine whether it's a file or a tag is to try to 
find it in the symbol tables.  If it's not there, assume it's a file.  (Or 
vice-versa.)

PC



Re: [PATCH v3] Allow user probes on versioned symbols

2017-04-05 Thread Paul Clarke

On 04/04/2017 09:26 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Apr 04, 2017 at 11:18:02PM +0900, Masami Hiramatsu escreveu:

On Mon, 3 Apr 2017 11:46:58 -0300
Arnaldo Carvalho de Melo  wrote:
> > > But apart from those problems, I think that one should be able to ask
> > > for a versioned symbol, to probe just apps using that specific version,
>
> > I agree, but wasn't trying to tackle that at the moment.  I can look into 
it, though.
>
> > > for instance, we should consider the whole name as two functions, which
> > > in fact, they are, no?
>
> > I'm not sure I understand what you mean here.  Do you mean we should set a 
probe at every version of a given symbol name?  For example, if there are symbols:
> > a@@V2
> > a@V1.1
> > a@V1
>
> > ...for a request to set a probe at "a", we'd actually set a probe at all 3?
>
> I think that we should just probe the default for that symbol and have a
> way to probe all of them, perhaps using the wildcard, i.e.:
>
> [root@jouet linux]# nm /lib64/libpthread-2.24.so  | grep ' 
pthread_cond_timedwait'
> dd90 T pthread_cond_timedwait@GLIBC_2.2.5
> d6e0 T pthread_cond_timedwait@@GLIBC_2.3.2
> [root@jouet linux]#
>
>   # perf probe -x /lib64/libpthread-2.24.so pthread_cond_timedwait
>
> should be equivalent to:
>
>   # perf probe -x /lib64/libpthread-2.24.so 
pthread_cond_timedwait@@GLIBC_2.3.2
>
> Which matches how these versioned symbols are resolved by the linker,
> no?
>
> I.e. when 'pthread_cond_timedwait' is specified and the symbol table
> lookup fails, I think we should re-lookup for
> 'pthread_cond_timedwait@@*', i.e. we should have a
> symbol__find_default_by_name(), which will take the
> "pthread_cond_timedwait" and use a symbol comparison using
> strncmp(strlen(key)), matching, should then look at right after the
> common part looking for the double @@.



Hm, this 'fallback'process sounds good idea to me.


I just sent a patch that does what you suggest, above.  To avoid duplicating 
the code in symbols_find_by_name, I added a parameter to tell it whether to 
ignore default symbol tags or not.


This is just trying to keep the semantics used by the original user of
this syntax, i.e. the linker.


BTW, how would we support other SYMBOL@VERSION, since we already
use '@' for specifying source code?
One possible way is to support it directly in perf-probe. If it
failed to find probe point from dwarf, try to find from symbol
map by using '@VERSION' suffix.


Right, we would be overloading that @ symbol, since version numbers
usually are very different of file source names :-)


There's not a lot of syntactic difference between "file" and "tag".  I don't think there 
is any standard for what either can be.  One might expect a "file" to be name.extension, where 
extension is a finite set (possibly fairly large).  A tag can be almost anything, I think.  One might expect 
it to end with a number.  I don't think there's a guarantee of either case, though.

It would seem one way to determine whether it's a file or a tag is to try to 
find it in the symbol tables.  If it's not there, assume it's a file.  (Or 
vice-versa.)

PC



RE: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread Long Li


> -Original Message-
> From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> Sent: Wednesday, April 5, 2017 5:32 PM
> To: linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> ; ax...@kernel.dk
> Cc: Stephen Hemminger ; KY Srinivasan
> ; Long Li 
> Subject: Re: [PATCH] block-mq: set both block queue and hardware queue
> restart bit for restart
> 
> On Wed, 2017-04-05 at 17:16 -0700, Long Li wrote:
> > Under heavy I/O, one hardware queue may be unable to dispatch any I/O
> > to the device layer. This poses a problem with restarting this
> > hardware queue on I/O finish in blk_mq_sched_restart_queues(), becaue
> > there is nothing pending that will finish in future on this hardware qeueu.
> This will result in deadlock.
> >
> > With this patch, we check for all possible stalled hardware queues
> > when I/O finishes on any hardware queues. This prevents this deadlock.
> >
> > Signed-off-by: Long Li 
> > ---
> >  block/blk-mq-sched.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index
> > 09af8ff..f7f3d44 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -202,7 +202,7 @@ void blk_mq_sched_dispatch_requests(struct
> blk_mq_hw_ctx *hctx)
> >  * needing a restart in that case.
> >  */
> > if (!list_empty(_list)) {
> > -   blk_mq_sched_mark_restart_hctx(hctx);
> > +   blk_mq_sched_mark_restart_queue(hctx);
> > did_work = blk_mq_dispatch_rq_list(hctx, _list);
> > } else if (!has_sched_dispatch) {
> > blk_mq_flush_busy_ctxs(hctx, _list);
> 
> Please drop this patch. I'm working on a better solution.

Thank you. Looking forward to your patch.

> 
> Thanks,
> 
> Bart.


RE: [PATCH] block-mq: set both block queue and hardware queue restart bit for restart

2017-04-05 Thread Long Li


> -Original Message-
> From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> Sent: Wednesday, April 5, 2017 5:32 PM
> To: linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; Long Li
> ; ax...@kernel.dk
> Cc: Stephen Hemminger ; KY Srinivasan
> ; Long Li 
> Subject: Re: [PATCH] block-mq: set both block queue and hardware queue
> restart bit for restart
> 
> On Wed, 2017-04-05 at 17:16 -0700, Long Li wrote:
> > Under heavy I/O, one hardware queue may be unable to dispatch any I/O
> > to the device layer. This poses a problem with restarting this
> > hardware queue on I/O finish in blk_mq_sched_restart_queues(), becaue
> > there is nothing pending that will finish in future on this hardware qeueu.
> This will result in deadlock.
> >
> > With this patch, we check for all possible stalled hardware queues
> > when I/O finishes on any hardware queues. This prevents this deadlock.
> >
> > Signed-off-by: Long Li 
> > ---
> >  block/blk-mq-sched.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index
> > 09af8ff..f7f3d44 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -202,7 +202,7 @@ void blk_mq_sched_dispatch_requests(struct
> blk_mq_hw_ctx *hctx)
> >  * needing a restart in that case.
> >  */
> > if (!list_empty(_list)) {
> > -   blk_mq_sched_mark_restart_hctx(hctx);
> > +   blk_mq_sched_mark_restart_queue(hctx);
> > did_work = blk_mq_dispatch_rq_list(hctx, _list);
> > } else if (!has_sched_dispatch) {
> > blk_mq_flush_busy_ctxs(hctx, _list);
> 
> Please drop this patch. I'm working on a better solution.

Thank you. Looking forward to your patch.

> 
> Thanks,
> 
> Bart.


[PATCH v5 1/2] module: verify address is read-only

2017-04-05 Thread Eddie Kovsky
Implement a mechanism to check if a module's address is in
the rodata or ro_after_init sections. It mimics the existing functions
that test if an address is inside a module's text section.

Functions that take a module as an argument will be able to verify that the
module address is in a read-only section. The idea is to prevent structures
(including modules) that are not read-only from being passed to functions.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

- provide mechanism to check for ro_after_init memory areas, and
  reject structures not marked ro_after_init in vmbus_register()

Suggested-by: Kees Cook 
Signed-off-by: Eddie Kovsky 
---
Changes in v4:
 - Rename function __module_ro_address() to __module_rodata_address()
 - Move module_rodata_address() stub below is_module_address()
 - Minor comment fixes
 - Verify that mod is not NULL before setting up layout variables
Changes in v3:
 - Change function name is_module_ro_address() to
is_module_rodata_address().
 - Improve comments on is_module_rodata_address().
 - Add a !CONFIG_MODULES stub for is_module_rodata_address.
 - Correct and simplify the check for the read-only memory regions in
the module address.
---
 include/linux/module.h | 12 
 kernel/module.c| 53 ++
 2 files changed, 65 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 9ad68561d8c2..a3d17b081de3 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod)
 
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
+struct module *__module_rodata_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
+bool is_module_rodata_address(unsigned long addr);
 bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
 bool is_module_percpu_address(unsigned long addr);
 bool is_module_text_address(unsigned long addr);
@@ -646,6 +648,11 @@ static inline struct module *__module_address(unsigned 
long addr)
return NULL;
 }
 
+static inline struct module *__module_rodata_address(unsigned long addr)
+{
+   return NULL;
+}
+
 static inline struct module *__module_text_address(unsigned long addr)
 {
return NULL;
@@ -656,6 +663,11 @@ static inline bool is_module_address(unsigned long addr)
return false;
 }
 
+static inline bool is_module_rodata_address(unsigned long addr)
+{
+   return false;
+}
+
 static inline bool is_module_percpu_address(unsigned long addr)
 {
return false;
diff --git a/kernel/module.c b/kernel/module.c
index f953df992a11..d5753210cf34 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4296,6 +4296,59 @@ struct module *__module_text_address(unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(__module_text_address);
 
+/**
+ * is_module_rodata_address - is this address inside read-only module data?
+ * @addr: the address to check.
+ *
+ */
+bool is_module_rodata_address(unsigned long addr)
+{
+   bool ret;
+
+   preempt_disable();
+   ret = __module_rodata_address(addr) != NULL;
+   preempt_enable();
+
+   return ret;
+}
+
+/*
+ * __module_rodata_address - get the module whose rodata/ro_after_init sections
+ * contain the given address.
+ * @addr: the address.
+ *
+ * Must be called with preempt disabled or module mutex held so that
+ * module doesn't get freed during this.
+ */
+struct module *__module_rodata_address(unsigned long addr)
+{
+   struct module *mod = __module_address(addr);
+
+   /*
+* Make sure module is within the read-only section.
+* range(base + text_size, base + ro_after_init_size)
+* encompasses both the rodata and ro_after_init regions.
+* See comment above frob_text() for the layout diagram.
+*/
+   if (mod) {
+   void *init_base = mod->init_layout.base;
+   unsigned int init_text_size = mod->init_layout.text_size;
+   unsigned int init_ro_after_init_size = 
mod->init_layout.ro_after_init_size;
+
+   void *core_base = mod->core_layout.base;
+   unsigned int core_text_size = mod->core_layout.text_size;
+   unsigned int core_ro_after_init_size = 
mod->core_layout.ro_after_init_size;
+
+   if (!within(addr, init_base + init_text_size,
+   init_ro_after_init_size - init_text_size)
+   && !within(addr, core_base + core_text_size,
+  core_ro_after_init_size - core_text_size))
+   mod = NULL;
+   }
+   return mod;
+}
+EXPORT_SYMBOL_GPL(__module_rodata_address);
+
 /* Don't grab lock, we're oopsing. */
 void print_modules(void)
 {
-- 
2.12.2



[PATCH v5 2/2] extable: verify address is read-only

2017-04-05 Thread Eddie Kovsky
Provide a mechanism to check if the address of a variable is
const or ro_after_init. It mimics the existing functions that test if an
address is inside the kernel's text section.

The idea is to prevent structures that are not read-only from being
passed to functions. Other functions inside the kernel could then use
this capability to verify that their arguments are read-only.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

- provide mechanism to check for ro_after_init memory areas, and
reject structures not marked ro_after_init in vmbus_register()

Suggested-by: Kees Cook 
Signed-off-by: Eddie Kovsky 
---
Changes in v5:
 - Replace __start_data_ro_after_init with __start_ro_after_init
 - Replace __end_data_ro_after_init with __end_ro_after_init
Changes in v4:
 - Rename function core_kernel_ro_data() to core_kernel_rodata().
Changes in v3:
 - Fix missing declaration of is_module_rodata_address()
---
 include/linux/kernel.h |  2 ++
 kernel/extable.c   | 29 +
 2 files changed, 31 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4c26dc3a8295..5748784ca209 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -444,6 +444,8 @@ extern int core_kernel_data(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
+extern int core_kernel_rodata(unsigned long addr);
+extern int kernel_ro_address(unsigned long addr);
 
 unsigned long int_sqrt(unsigned long);
 
diff --git a/kernel/extable.c b/kernel/extable.c
index 2676d7f8baf6..18c5e4dbe0fc 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -154,3 +154,32 @@ int func_ptr_is_kernel_text(void *ptr)
return 1;
return is_module_text_address(addr);
 }
+
+/**
+ * core_kernel_rodata - Verify address points to read-only section
+ * @addr: address to test
+ *
+ */
+int core_kernel_rodata(unsigned long addr)
+{
+   if (addr >= (unsigned long)__start_rodata &&
+   addr < (unsigned long)__end_rodata)
+   return 1;
+
+   if (addr >= (unsigned long)__start_ro_after_init &&
+   addr < (unsigned long)__end_ro_after_init)
+   return 1;
+
+   return 0;
+}
+
+/* Verify that address is const or ro_after_init. */
+int kernel_ro_address(unsigned long addr)
+{
+   if (core_kernel_rodata(addr))
+   return 1;
+   if (is_module_rodata_address(addr))
+   return 1;
+
+   return 0;
+}
-- 
2.12.2



[PATCH v5 2/2] extable: verify address is read-only

2017-04-05 Thread Eddie Kovsky
Provide a mechanism to check if the address of a variable is
const or ro_after_init. It mimics the existing functions that test if an
address is inside the kernel's text section.

The idea is to prevent structures that are not read-only from being
passed to functions. Other functions inside the kernel could then use
this capability to verify that their arguments are read-only.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

- provide mechanism to check for ro_after_init memory areas, and
reject structures not marked ro_after_init in vmbus_register()

Suggested-by: Kees Cook 
Signed-off-by: Eddie Kovsky 
---
Changes in v5:
 - Replace __start_data_ro_after_init with __start_ro_after_init
 - Replace __end_data_ro_after_init with __end_ro_after_init
Changes in v4:
 - Rename function core_kernel_ro_data() to core_kernel_rodata().
Changes in v3:
 - Fix missing declaration of is_module_rodata_address()
---
 include/linux/kernel.h |  2 ++
 kernel/extable.c   | 29 +
 2 files changed, 31 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4c26dc3a8295..5748784ca209 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -444,6 +444,8 @@ extern int core_kernel_data(unsigned long addr);
 extern int __kernel_text_address(unsigned long addr);
 extern int kernel_text_address(unsigned long addr);
 extern int func_ptr_is_kernel_text(void *ptr);
+extern int core_kernel_rodata(unsigned long addr);
+extern int kernel_ro_address(unsigned long addr);
 
 unsigned long int_sqrt(unsigned long);
 
diff --git a/kernel/extable.c b/kernel/extable.c
index 2676d7f8baf6..18c5e4dbe0fc 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -154,3 +154,32 @@ int func_ptr_is_kernel_text(void *ptr)
return 1;
return is_module_text_address(addr);
 }
+
+/**
+ * core_kernel_rodata - Verify address points to read-only section
+ * @addr: address to test
+ *
+ */
+int core_kernel_rodata(unsigned long addr)
+{
+   if (addr >= (unsigned long)__start_rodata &&
+   addr < (unsigned long)__end_rodata)
+   return 1;
+
+   if (addr >= (unsigned long)__start_ro_after_init &&
+   addr < (unsigned long)__end_ro_after_init)
+   return 1;
+
+   return 0;
+}
+
+/* Verify that address is const or ro_after_init. */
+int kernel_ro_address(unsigned long addr)
+{
+   if (core_kernel_rodata(addr))
+   return 1;
+   if (is_module_rodata_address(addr))
+   return 1;
+
+   return 0;
+}
-- 
2.12.2



[PATCH v5 1/2] module: verify address is read-only

2017-04-05 Thread Eddie Kovsky
Implement a mechanism to check if a module's address is in
the rodata or ro_after_init sections. It mimics the existing functions
that test if an address is inside a module's text section.

Functions that take a module as an argument will be able to verify that the
module address is in a read-only section. The idea is to prevent structures
(including modules) that are not read-only from being passed to functions.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

- provide mechanism to check for ro_after_init memory areas, and
  reject structures not marked ro_after_init in vmbus_register()

Suggested-by: Kees Cook 
Signed-off-by: Eddie Kovsky 
---
Changes in v4:
 - Rename function __module_ro_address() to __module_rodata_address()
 - Move module_rodata_address() stub below is_module_address()
 - Minor comment fixes
 - Verify that mod is not NULL before setting up layout variables
Changes in v3:
 - Change function name is_module_ro_address() to
is_module_rodata_address().
 - Improve comments on is_module_rodata_address().
 - Add a !CONFIG_MODULES stub for is_module_rodata_address.
 - Correct and simplify the check for the read-only memory regions in
the module address.
---
 include/linux/module.h | 12 
 kernel/module.c| 53 ++
 2 files changed, 65 insertions(+)

diff --git a/include/linux/module.h b/include/linux/module.h
index 9ad68561d8c2..a3d17b081de3 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod)
 
 struct module *__module_text_address(unsigned long addr);
 struct module *__module_address(unsigned long addr);
+struct module *__module_rodata_address(unsigned long addr);
 bool is_module_address(unsigned long addr);
+bool is_module_rodata_address(unsigned long addr);
 bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr);
 bool is_module_percpu_address(unsigned long addr);
 bool is_module_text_address(unsigned long addr);
@@ -646,6 +648,11 @@ static inline struct module *__module_address(unsigned 
long addr)
return NULL;
 }
 
+static inline struct module *__module_rodata_address(unsigned long addr)
+{
+   return NULL;
+}
+
 static inline struct module *__module_text_address(unsigned long addr)
 {
return NULL;
@@ -656,6 +663,11 @@ static inline bool is_module_address(unsigned long addr)
return false;
 }
 
+static inline bool is_module_rodata_address(unsigned long addr)
+{
+   return false;
+}
+
 static inline bool is_module_percpu_address(unsigned long addr)
 {
return false;
diff --git a/kernel/module.c b/kernel/module.c
index f953df992a11..d5753210cf34 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -4296,6 +4296,59 @@ struct module *__module_text_address(unsigned long addr)
 }
 EXPORT_SYMBOL_GPL(__module_text_address);
 
+/**
+ * is_module_rodata_address - is this address inside read-only module data?
+ * @addr: the address to check.
+ *
+ */
+bool is_module_rodata_address(unsigned long addr)
+{
+   bool ret;
+
+   preempt_disable();
+   ret = __module_rodata_address(addr) != NULL;
+   preempt_enable();
+
+   return ret;
+}
+
+/*
+ * __module_rodata_address - get the module whose rodata/ro_after_init sections
+ * contain the given address.
+ * @addr: the address.
+ *
+ * Must be called with preempt disabled or module mutex held so that
+ * module doesn't get freed during this.
+ */
+struct module *__module_rodata_address(unsigned long addr)
+{
+   struct module *mod = __module_address(addr);
+
+   /*
+* Make sure module is within the read-only section.
+* range(base + text_size, base + ro_after_init_size)
+* encompasses both the rodata and ro_after_init regions.
+* See comment above frob_text() for the layout diagram.
+*/
+   if (mod) {
+   void *init_base = mod->init_layout.base;
+   unsigned int init_text_size = mod->init_layout.text_size;
+   unsigned int init_ro_after_init_size = 
mod->init_layout.ro_after_init_size;
+
+   void *core_base = mod->core_layout.base;
+   unsigned int core_text_size = mod->core_layout.text_size;
+   unsigned int core_ro_after_init_size = 
mod->core_layout.ro_after_init_size;
+
+   if (!within(addr, init_base + init_text_size,
+   init_ro_after_init_size - init_text_size)
+   && !within(addr, core_base + core_text_size,
+  core_ro_after_init_size - core_text_size))
+   mod = NULL;
+   }
+   return mod;
+}
+EXPORT_SYMBOL_GPL(__module_rodata_address);
+
 /* Don't grab lock, we're oopsing. */
 void print_modules(void)
 {
-- 
2.12.2



[PATCH v5 0/2] provide check for ro_after_init memory sections

2017-04-05 Thread Eddie Kovsky
Provide a mechanism for other functions to verify that their arguments
are read-only.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

- provide mechanism to check for ro_after_init memory areas, and
  reject structures not marked ro_after_init in vmbus_register()

  http://www.openwall.com/lists/kernel-hardening/2017/02/04/1

The idea is to prevent structures (including modules) that are not
read-only from being passed to functions. It builds upon the functions
in kernel/extable.c that test if an address is in the text section.

A build failure on the Blackfin architecture led to the discovery of
an incomplete definition of the RO_DATA macro used in this series. The
fixes are in linux-next:

commit 906f2a51c941 ("mm: fix section name for .data..ro_after_init")

commit 939897e2d736 ("vmlinux.lds: add missing VMLINUX_SYMBOL macros")

The latest version of this series uses new symbols provided in these
fixes. The series now cross compiles on Blackfin without errors. I have
also test compiled this series on next-20170405 for x86.

I have dropped the third patch that uses these features to check the
arguments to vmbus_register() because the maintainers have not been
receptive to using it. My goal right now is to get the API right.

Eddie Kovsky (2):
  module: verify address is read-only
  extable: verify address is read-only

 include/linux/kernel.h |  2 ++
 include/linux/module.h | 12 
 kernel/extable.c   | 29 +++
 kernel/module.c| 53 ++
 4 files changed, 96 insertions(+)

-- 
2.12.2



[PATCH v5 0/2] provide check for ro_after_init memory sections

2017-04-05 Thread Eddie Kovsky
Provide a mechanism for other functions to verify that their arguments
are read-only.

This implements the first half of a suggestion made by Kees Cook for
the Kernel Self Protection Project:

- provide mechanism to check for ro_after_init memory areas, and
  reject structures not marked ro_after_init in vmbus_register()

  http://www.openwall.com/lists/kernel-hardening/2017/02/04/1

The idea is to prevent structures (including modules) that are not
read-only from being passed to functions. It builds upon the functions
in kernel/extable.c that test if an address is in the text section.

A build failure on the Blackfin architecture led to the discovery of
an incomplete definition of the RO_DATA macro used in this series. The
fixes are in linux-next:

commit 906f2a51c941 ("mm: fix section name for .data..ro_after_init")

commit 939897e2d736 ("vmlinux.lds: add missing VMLINUX_SYMBOL macros")

The latest version of this series uses new symbols provided in these
fixes. The series now cross compiles on Blackfin without errors. I have
also test compiled this series on next-20170405 for x86.

I have dropped the third patch that uses these features to check the
arguments to vmbus_register() because the maintainers have not been
receptive to using it. My goal right now is to get the API right.

Eddie Kovsky (2):
  module: verify address is read-only
  extable: verify address is read-only

 include/linux/kernel.h |  2 ++
 include/linux/module.h | 12 
 kernel/extable.c   | 29 +++
 kernel/module.c| 53 ++
 4 files changed, 96 insertions(+)

-- 
2.12.2



linux-next: manual merge of the staging tree with the v4l-dvb tree

2017-04-05 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the staging tree got conflicts in:

  drivers/staging/media/lirc/lirc_sasem.c
  drivers/staging/media/lirc/lirc_sir.c

between commits:

  e66267161971 ("[media] rc: promote lirc_sir out of staging")
  51bb3fd788cb ("[media] staging: lirc_sasem: remove")

from the v4l-dvb tree and commit:

  87ddb91067b9 ("Staging: media: lirc - style fix")

from the staging tree.

I fixed it up (I removed both files - the updates to lirc_sir.c were
also done in the v4l-dvb tree) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the staging tree with the v4l-dvb tree

2017-04-05 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the staging tree got conflicts in:

  drivers/staging/media/lirc/lirc_sasem.c
  drivers/staging/media/lirc/lirc_sir.c

between commits:

  e66267161971 ("[media] rc: promote lirc_sir out of staging")
  51bb3fd788cb ("[media] staging: lirc_sasem: remove")

from the v4l-dvb tree and commit:

  87ddb91067b9 ("Staging: media: lirc - style fix")

from the staging tree.

I fixed it up (I removed both files - the updates to lirc_sir.c were
also done in the v4l-dvb tree) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell


[PATCH v4] tools/perf: Allow user probes on versioned symbols

2017-04-05 Thread Paul Clarke

Symbol versioning, as in glibc, results in symbols being defined as:
@[@]
(Note that "@@" identifies a default symbol, if the symbol name
is repeated.)

perf is currently unable to deal with this, and is unable to create
user probes at such symbols:
--
$ nm /lib/powerpc64le-linux-gnu/libpthread.so.0 | grep pthread_create
8d30 t __pthread_create_2_1
8d30 T pthread_create@@GLIBC_2.17
$ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 
pthread_create
probe-definition(0): pthread_create
symbol:pthread_create file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /usr/lib/debug/lib/powerpc64le-linux-gnu/libpthread-2.19.so
Try to find probe point from debuginfo.
Probe point 'pthread_create' not found.
   Error: Failed to add events. Reason: No such file or directory (Code: -2)
--

One is not able to specify the fully versioned symbol, either, due to
syntactic conflicts with other uses of "@" by perf:
--
$ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 
pthread_create@@GLIBC_2.17
probe-definition(0): pthread_create@@GLIBC_2.17
Semantic error :SRC@SRC is not allowed.
0 arguments
   Error: Command Parse Error. Reason: Invalid argument (Code: -22)
--

This patch ignores the versioning tag for default symbols, thus
allowing probes to be created for these symbols:
--
$ /usr/bin/sudo ./perf probe -x /lib/powerpc64le-linux-gnu/libpthread.so.0 
pthread_create
Added new event:
   probe_libpthread:pthread_create (on pthread_create in 
/lib/powerpc64le-linux-gnu/libpthread-2.19.so)

You can now use it in all perf tools, such as:

 perf record -e probe_libpthread:pthread_create -aR sleep 1

$ /usr/bin/sudo ./perf record -e probe_libpthread:pthread_create -aR ./test 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.052 MB perf.data (2 samples) ]
$ /usr/bin/sudo ./perf script
 test  2915 [000] 19124.260729: probe_libpthread:pthread_create: 
(3fff99248d38)
 test  2916 [000] 19124.260962: probe_libpthread:pthread_create: 
(3fff99248d38)
$ /usr/bin/sudo ./perf probe --del=probe_libpthread:pthread_create
Removed event: probe_libpthread:pthread_create
--

Signed-off-by: Paul A. Clarke 
---
v4:
- rebased to acme/perf/core
- moved changes from "map" namespace to "symbol" namespace,
- rewrote symbol__compare (now *match) to avoid need for strdup
- new symbol__match_symbol_name to support versioned symbols, ignoring default
  tags
- new arch__compare_symbol_names_n to map to strncmp
- dso__find_symbol_by_name: now tries exact match (as before), then tries
   also adding symbols tagged as default (@@)
- symbols__find_by_name: new parameter to support finding with or without 
default
  tags
- does NOT handle setting probes using the tagged symbol name (symbol@[@]tag)

v3:
- code style fixes per David Ahern

v2:
- move logic from arch__compare_symbol_names to new map__compare_symbol_names
- call through from map__compare_symbol_names to arch__compare_symbol_names
- redirect uses of arch__compare_symbol_names
- send patch to LKML
 tools/perf/arch/powerpc/util/sym-handling.c | 12 ++
 tools/perf/util/map.c   |  5 ---
 tools/perf/util/map.h   |  5 ++-
 tools/perf/util/symbol.c| 62 +++--
 tools/perf/util/symbol.h|  9 +
 5 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index 39dbe51..0d40e17 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -52,6 +52,18 @@ int arch__compare_symbol_names(const char *namea, const char 
*nameb)
 
 	return strcmp(namea, nameb);

 }
+
+int arch__compare_symbol_names_n(const char *namea, const char *nameb,
+   unsigned int n)
+{
+   /* Skip over initial dot */
+   if (*namea == '.')
+   namea++;
+   if (*nameb == '.')
+   nameb++;
+
+   return strncmp(namea, nameb, n);
+}
 #endif
 
 #if defined(_CALL_ELF) && _CALL_ELF == 2

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index c1870ac..f4d8272 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -325,11 +325,6 @@ int map__load(struct map *map)
return 0;
 }
 
-int __weak arch__compare_symbol_names(const char *namea, const char *nameb)

-{
-   return strcmp(namea, nameb);
-}
-
 struct symbol *map__find_symbol(struct map *map, u64 addr)
 {
if (map__load(map) < 0)
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index c8a5a64..325bbc8 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -130,13 +130,14 @@ struct thread;
  */
 #define __map__for_each_symbol_by_name(map, sym_name, pos) \
for (pos = map__find_symbol_by_name(map, sym_name); \
-pos && 

[PATCH v4] tools/perf: Allow user probes on versioned symbols

2017-04-05 Thread Paul Clarke

Symbol versioning, as in glibc, results in symbols being defined as:
@[@]
(Note that "@@" identifies a default symbol, if the symbol name
is repeated.)

perf is currently unable to deal with this, and is unable to create
user probes at such symbols:
--
$ nm /lib/powerpc64le-linux-gnu/libpthread.so.0 | grep pthread_create
8d30 t __pthread_create_2_1
8d30 T pthread_create@@GLIBC_2.17
$ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 
pthread_create
probe-definition(0): pthread_create
symbol:pthread_create file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Open Debuginfo file: /usr/lib/debug/lib/powerpc64le-linux-gnu/libpthread-2.19.so
Try to find probe point from debuginfo.
Probe point 'pthread_create' not found.
   Error: Failed to add events. Reason: No such file or directory (Code: -2)
--

One is not able to specify the fully versioned symbol, either, due to
syntactic conflicts with other uses of "@" by perf:
--
$ /usr/bin/sudo perf probe -v -x /lib/powerpc64le-linux-gnu/libpthread.so.0 
pthread_create@@GLIBC_2.17
probe-definition(0): pthread_create@@GLIBC_2.17
Semantic error :SRC@SRC is not allowed.
0 arguments
   Error: Command Parse Error. Reason: Invalid argument (Code: -22)
--

This patch ignores the versioning tag for default symbols, thus
allowing probes to be created for these symbols:
--
$ /usr/bin/sudo ./perf probe -x /lib/powerpc64le-linux-gnu/libpthread.so.0 
pthread_create
Added new event:
   probe_libpthread:pthread_create (on pthread_create in 
/lib/powerpc64le-linux-gnu/libpthread-2.19.so)

You can now use it in all perf tools, such as:

 perf record -e probe_libpthread:pthread_create -aR sleep 1

$ /usr/bin/sudo ./perf record -e probe_libpthread:pthread_create -aR ./test 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.052 MB perf.data (2 samples) ]
$ /usr/bin/sudo ./perf script
 test  2915 [000] 19124.260729: probe_libpthread:pthread_create: 
(3fff99248d38)
 test  2916 [000] 19124.260962: probe_libpthread:pthread_create: 
(3fff99248d38)
$ /usr/bin/sudo ./perf probe --del=probe_libpthread:pthread_create
Removed event: probe_libpthread:pthread_create
--

Signed-off-by: Paul A. Clarke 
---
v4:
- rebased to acme/perf/core
- moved changes from "map" namespace to "symbol" namespace,
- rewrote symbol__compare (now *match) to avoid need for strdup
- new symbol__match_symbol_name to support versioned symbols, ignoring default
  tags
- new arch__compare_symbol_names_n to map to strncmp
- dso__find_symbol_by_name: now tries exact match (as before), then tries
   also adding symbols tagged as default (@@)
- symbols__find_by_name: new parameter to support finding with or without 
default
  tags
- does NOT handle setting probes using the tagged symbol name (symbol@[@]tag)

v3:
- code style fixes per David Ahern

v2:
- move logic from arch__compare_symbol_names to new map__compare_symbol_names
- call through from map__compare_symbol_names to arch__compare_symbol_names
- redirect uses of arch__compare_symbol_names
- send patch to LKML
 tools/perf/arch/powerpc/util/sym-handling.c | 12 ++
 tools/perf/util/map.c   |  5 ---
 tools/perf/util/map.h   |  5 ++-
 tools/perf/util/symbol.c| 62 +++--
 tools/perf/util/symbol.h|  9 +
 5 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c 
b/tools/perf/arch/powerpc/util/sym-handling.c
index 39dbe51..0d40e17 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -52,6 +52,18 @@ int arch__compare_symbol_names(const char *namea, const char 
*nameb)
 
 	return strcmp(namea, nameb);

 }
+
+int arch__compare_symbol_names_n(const char *namea, const char *nameb,
+   unsigned int n)
+{
+   /* Skip over initial dot */
+   if (*namea == '.')
+   namea++;
+   if (*nameb == '.')
+   nameb++;
+
+   return strncmp(namea, nameb, n);
+}
 #endif
 
 #if defined(_CALL_ELF) && _CALL_ELF == 2

diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index c1870ac..f4d8272 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -325,11 +325,6 @@ int map__load(struct map *map)
return 0;
 }
 
-int __weak arch__compare_symbol_names(const char *namea, const char *nameb)

-{
-   return strcmp(namea, nameb);
-}
-
 struct symbol *map__find_symbol(struct map *map, u64 addr)
 {
if (map__load(map) < 0)
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index c8a5a64..325bbc8 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -130,13 +130,14 @@ struct thread;
  */
 #define __map__for_each_symbol_by_name(map, sym_name, pos) \
for (pos = map__find_symbol_by_name(map, sym_name); \
-pos && arch__compare_symbol_names(pos->name, 

Re: [PATCH] get_maintainer: Add Yakir Yang to .get_maintainer.ignore

2017-04-05 Thread jeffy

Hi Joe,

On 04/06/2017 10:39 AM, Joe Perches wrote:

On Thu, 2017-04-06 at 10:01 +0800, Jeffy Chen wrote:

The mail account "Yakir Yang " is no longer
valid.


Does this person still want to be involved with the kernel?
If so, perhaps a .mailmap entry would be more appropriate.

i don't think he will continue working on drm(or even kernel), but 
you're right, i'll try to contact him and ask would he wanna map to his 
new mail address.

diff --git a/.get_maintainer.ignore b/.get_maintainer.ignore

[]

@@ -1 +1,2 @@
  Christoph Hellwig 
+Yakir Yang 










Re: [PATCH] get_maintainer: Add Yakir Yang to .get_maintainer.ignore

2017-04-05 Thread jeffy

Hi Joe,

On 04/06/2017 10:39 AM, Joe Perches wrote:

On Thu, 2017-04-06 at 10:01 +0800, Jeffy Chen wrote:

The mail account "Yakir Yang " is no longer
valid.


Does this person still want to be involved with the kernel?
If so, perhaps a .mailmap entry would be more appropriate.

i don't think he will continue working on drm(or even kernel), but 
you're right, i'll try to contact him and ask would he wanna map to his 
new mail address.

diff --git a/.get_maintainer.ignore b/.get_maintainer.ignore

[]

@@ -1 +1,2 @@
  Christoph Hellwig 
+Yakir Yang 










Re: [PATCH v2 00/11] fujitsu-laptop: backlight cleanup

2017-04-05 Thread Darren Hart
On Thu, Apr 06, 2017 at 10:15:14AM +0930, Jonathan Woithe wrote:
> Hi Michael
> 
> On Wed, Apr 05, 2017 at 09:55:34PM +0200, Micha?? K??pie?? wrote:
> > > On Wed, Apr 05, 2017 at 08:48:59AM +0200, Micha?? K??pie?? wrote:
> > > > This series introduces further changes to the way LCD backlight is
> > > > handled by fujitsu-laptop.  These changes include fixing a bug in code
> > > > responsible for generating brightness-related input events, cleaning up
> > > > handling of module parameters, reducing code duplication, removing
> > > > superfluous debug messages and other fixes.
> > > > 
> > > > This series was tested on a Lifebook S7020 and a Lifebook E744.
> > > > 
> > > > This series is based on the testing branch as it requires earlier patch
> > > > series I submitted in order to apply cleanly.
> > > > 
> > > > Changes from v1:
> > > > 
> > > >   - Update debug message logged by set_lcd_level() to reflect code flow
> > > > changes introduced by patch 04/11.
> > > 
> > > Queued to testing, thanks! 
> > 
> > Jonathan, I would still love to hear your opinion on this series.  Take
> > your time, though, I do not see any reason to rush things.  I will only
> > send the next series once you ack this one.
> 
> Sure, no problem.  As mentioned earlier, this week has been busy.  I am
> hoping I might find the time to complete my review this evening.  If not, it
> will be some time over the weekend.

Eeek, I jumped the gun on that. I've moved this from testing to fujitsu, and
it'll wait there until Jonathan gets a chance to review. Apologies Jonathan.

-- 
Darren Hart
VMware Open Source Technology Center


Re: [PATCH v2 00/11] fujitsu-laptop: backlight cleanup

2017-04-05 Thread Darren Hart
On Thu, Apr 06, 2017 at 10:15:14AM +0930, Jonathan Woithe wrote:
> Hi Michael
> 
> On Wed, Apr 05, 2017 at 09:55:34PM +0200, Micha?? K??pie?? wrote:
> > > On Wed, Apr 05, 2017 at 08:48:59AM +0200, Micha?? K??pie?? wrote:
> > > > This series introduces further changes to the way LCD backlight is
> > > > handled by fujitsu-laptop.  These changes include fixing a bug in code
> > > > responsible for generating brightness-related input events, cleaning up
> > > > handling of module parameters, reducing code duplication, removing
> > > > superfluous debug messages and other fixes.
> > > > 
> > > > This series was tested on a Lifebook S7020 and a Lifebook E744.
> > > > 
> > > > This series is based on the testing branch as it requires earlier patch
> > > > series I submitted in order to apply cleanly.
> > > > 
> > > > Changes from v1:
> > > > 
> > > >   - Update debug message logged by set_lcd_level() to reflect code flow
> > > > changes introduced by patch 04/11.
> > > 
> > > Queued to testing, thanks! 
> > 
> > Jonathan, I would still love to hear your opinion on this series.  Take
> > your time, though, I do not see any reason to rush things.  I will only
> > send the next series once you ack this one.
> 
> Sure, no problem.  As mentioned earlier, this week has been busy.  I am
> hoping I might find the time to complete my review this evening.  If not, it
> will be some time over the weekend.

Eeek, I jumped the gun on that. I've moved this from testing to fujitsu, and
it'll wait there until Jonathan gets a chance to review. Apologies Jonathan.

-- 
Darren Hart
VMware Open Source Technology Center


Re: [RFC net-next] bpf: taint loading !is_gpl programs

2017-04-05 Thread Aaron Conole
Hi Daniel,

Daniel Borkmann  writes:

> On 04/04/2017 08:33 PM, Aaron Conole wrote:
>> The eBPF framework is used for more than just socket level filtering.  It
>> can also provide tracing, and even change the way packets coming into the
>> system look.  Most of the eBPF callable symbols are available to non-gpl
>> programs, and this includes helper functions which modify packets.  This
>> allows proprietary eBPF code to link to the kernel and make decisions
>> which can negatively impact network performance.
>>
>> Since the sources for these programs are only available under a proprietary
>> license, it seems better to treat them the same as other proprietary
>> modules: set the system taint flag.  An exemption is made for socket-level
>> filters, since they do not really impact networking for the whole kernel.
>>
>> Signed-off-by: Aaron Conole 
>> ---
>
> Nacked-by: Daniel Borkmann 

Thanks so much for looking at the patch!

> This is proposal completely unreasonable; what the purpose of .gpl_only
> flags is agreed upon since the beginning is that some of the helpers
> are only available if the program is loaded as gpl, f.e. bpf_ktime_get_ns(),
> bpf_probe_read(), bpf_probe_write_user(), bpf_trace_printk(),
> bpf_skb_event_output(), etc.

This behavior isn't changing with this patch.

> Now, suddenly switching from one kernel
> version to another, existing programs would out of a sudden taint the
> kernel, which by itself is unacceptable.

I'm not sure what you mean here.  The kernel should still be usable.  This
basically says that if someone runs non-GPL eBPF code, they are tainting
the kernel.  More below.

> There are also many other
> subsystems that can modify packets, or affect system performance
> negatively if configured wrongly and which in addition *don't require* a
> hard capable(CAP_SYS_ADMIN) restriction like such eBPF programs already
> do, perhaps should we taint them as well?

This is a good point that there are other methods of doing damage to the
network.  I think it means my commit message wasn't clear enough to
describe why I wanted the change.  The reason I propose this isn't
because someone can theoretically damage things.  It's because eBPF
really is a way of writing specialized kernel modules.

I am really making the distinction here that eBPF code (except for the
case of user-space socket filter) is a kernel module.  I realize that
may not be something folks have considered.  Never-the-less, it is code
which runs in the context of the kernel, out lives the lifetime of user
space, and modifies kernel behavior.  These are the main reasons I
believe this is a kernel module.  And since it is a kernel module, it
shouldn't bypass the existing taint flag that says 'someone is running
non-gpl code in kernel space.'  Do you disagree?  This is also why I
exempt socket filter code.  That really is something which I would
consider running as part of a user-space application.

> Plus tracing programs are
> attached to passively monitor systems performance, not even modifying
> data structures

Tracing code, afaict, must be gpl_only = true to be useful.  So I don't see
how it enters into the equation.  Did I misread something?  Most, if not
all of the networking functions, are gpl_only = false.  This means the
community will have a difficult time supporting reports from this
system.  After all, there's no way to know exactly how this eBPF program
has changed packets in the network without a license to the code.

> ... The current purpose of .gpl_only is fine as-is, and
> there's work in progress for a generic dump mechanism that works with
> all program types to improve introspection aspect if that's what you're
> after, starting to taint is, in a way, breaking existing applications
> and this is not acceptable.

I don't see how this breaks applications.  They continue to run fine.
This patch does not restrict functionality.  Again, did I misunderstand
something?

I'm not sure how a dumping mechanism changes anything either.  I agree
such a utility is very useful.  However, if the poor user who is running
a non-gpl eBPF program is asked to provide a dump of that eBPF program,
they may be barred from doing so by licensing.  How can those cases be
supported?


Re: [RFC net-next] bpf: taint loading !is_gpl programs

2017-04-05 Thread Aaron Conole
Hi Daniel,

Daniel Borkmann  writes:

> On 04/04/2017 08:33 PM, Aaron Conole wrote:
>> The eBPF framework is used for more than just socket level filtering.  It
>> can also provide tracing, and even change the way packets coming into the
>> system look.  Most of the eBPF callable symbols are available to non-gpl
>> programs, and this includes helper functions which modify packets.  This
>> allows proprietary eBPF code to link to the kernel and make decisions
>> which can negatively impact network performance.
>>
>> Since the sources for these programs are only available under a proprietary
>> license, it seems better to treat them the same as other proprietary
>> modules: set the system taint flag.  An exemption is made for socket-level
>> filters, since they do not really impact networking for the whole kernel.
>>
>> Signed-off-by: Aaron Conole 
>> ---
>
> Nacked-by: Daniel Borkmann 

Thanks so much for looking at the patch!

> This is proposal completely unreasonable; what the purpose of .gpl_only
> flags is agreed upon since the beginning is that some of the helpers
> are only available if the program is loaded as gpl, f.e. bpf_ktime_get_ns(),
> bpf_probe_read(), bpf_probe_write_user(), bpf_trace_printk(),
> bpf_skb_event_output(), etc.

This behavior isn't changing with this patch.

> Now, suddenly switching from one kernel
> version to another, existing programs would out of a sudden taint the
> kernel, which by itself is unacceptable.

I'm not sure what you mean here.  The kernel should still be usable.  This
basically says that if someone runs non-GPL eBPF code, they are tainting
the kernel.  More below.

> There are also many other
> subsystems that can modify packets, or affect system performance
> negatively if configured wrongly and which in addition *don't require* a
> hard capable(CAP_SYS_ADMIN) restriction like such eBPF programs already
> do, perhaps should we taint them as well?

This is a good point that there are other methods of doing damage to the
network.  I think it means my commit message wasn't clear enough to
describe why I wanted the change.  The reason I propose this isn't
because someone can theoretically damage things.  It's because eBPF
really is a way of writing specialized kernel modules.

I am really making the distinction here that eBPF code (except for the
case of user-space socket filter) is a kernel module.  I realize that
may not be something folks have considered.  Never-the-less, it is code
which runs in the context of the kernel, out lives the lifetime of user
space, and modifies kernel behavior.  These are the main reasons I
believe this is a kernel module.  And since it is a kernel module, it
shouldn't bypass the existing taint flag that says 'someone is running
non-gpl code in kernel space.'  Do you disagree?  This is also why I
exempt socket filter code.  That really is something which I would
consider running as part of a user-space application.

> Plus tracing programs are
> attached to passively monitor systems performance, not even modifying
> data structures

Tracing code, afaict, must be gpl_only = true to be useful.  So I don't see
how it enters into the equation.  Did I misread something?  Most, if not
all of the networking functions, are gpl_only = false.  This means the
community will have a difficult time supporting reports from this
system.  After all, there's no way to know exactly how this eBPF program
has changed packets in the network without a license to the code.

> ... The current purpose of .gpl_only is fine as-is, and
> there's work in progress for a generic dump mechanism that works with
> all program types to improve introspection aspect if that's what you're
> after, starting to taint is, in a way, breaking existing applications
> and this is not acceptable.

I don't see how this breaks applications.  They continue to run fine.
This patch does not restrict functionality.  Again, did I misunderstand
something?

I'm not sure how a dumping mechanism changes anything either.  I agree
such a utility is very useful.  However, if the poor user who is running
a non-gpl eBPF program is asked to provide a dump of that eBPF program,
they may be barred from doing so by licensing.  How can those cases be
supported?


Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

2017-04-05 Thread Matthew Wilcox
On Thu, Apr 06, 2017 at 10:02:48AM +1000, NeilBrown wrote:
> If you are concerned about space in 'struct address_space', just prune
> some wastage.

I'm trying to (via wlists).  still buggy though.

> The "host" field brings no value.  It is only ever assigned in
> inode_init_always():
> 
>   struct address_space *const mapping = >i_data;
> ..
>   mapping->host = inode;
> 
> So you could change all references to use
>container_of(mapping, struct inode, i_data)

Alas, no:

drivers/dax/dax.c:  inode->i_mapping->host = dax_dev->inode;
fs/gfs2/glock.c:mapping->host = s->s_bdev->bd_inode;
fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;
fs/nilfs2/page.c:   mapping->host = inode;



Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

2017-04-05 Thread Matthew Wilcox
On Thu, Apr 06, 2017 at 10:02:48AM +1000, NeilBrown wrote:
> If you are concerned about space in 'struct address_space', just prune
> some wastage.

I'm trying to (via wlists).  still buggy though.

> The "host" field brings no value.  It is only ever assigned in
> inode_init_always():
> 
>   struct address_space *const mapping = >i_data;
> ..
>   mapping->host = inode;
> 
> So you could change all references to use
>container_of(mapping, struct inode, i_data)

Alas, no:

drivers/dax/dax.c:  inode->i_mapping->host = dax_dev->inode;
fs/gfs2/glock.c:mapping->host = s->s_bdev->bd_inode;
fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;
fs/nilfs2/page.c:   mapping->host = inode;



Re: [PATCH v3 8/9] drm/rockchip: gem: Don't alloc/free gem buf when dev_private is invalid

2017-04-05 Thread jeffy

Hi Sean,

On 04/06/2017 12:28 AM, Sean Paul wrote:

On Wed, Apr 05, 2017 at 04:29:26PM +0800, Jeffy Chen wrote:

After unbinding drm, the userspace may still has a chance to access
gem buf.

Add a sanity check for a NULL dev_private to prevent that from
happening.


I still don't understand how this is happening. You're saying that these hooks
can be called after rockchip_drm_unbind() has finished?

yes, tested on chromebook rk3399 kevin with kernel 4.4, if trigger 
unbind without killing display service(ui or frecon):


[   31.276889] [] dump_backtrace+0x0/0x164
[   31.282288] [] show_stack+0x24/0x30
[   31.287338] [] dump_stack+0x98/0xb8
[   31.292389] [] rockchip_gem_create_object+0x6c/0x2ec
[   31.298910] [] 
rockchip_gem_create_with_handle+0x38/0x10c

[   31.305868] [] rockchip_gem_create_ioctl+0x38/0x50
[   31.312221] [] drm_ioctl+0x2bc/0x438
[   31.317359] [] drm_compat_ioctl+0x3c/0x70
[   31.322935] [] compat_SyS_ioctl+0x134/0x1048
[   31.328766] [] __sys_trace_return+0x0/0x4


Sean



Signed-off-by: Jeffy Chen 
---

Changes in v3:
Address Daniel Vetter 's comments.
Update commit message.

Changes in v2: None

  drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index df9e570..205a3dc 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -184,6 +184,9 @@ static int rockchip_gem_alloc_buf(struct 
rockchip_gem_object *rk_obj,
struct drm_device *drm = obj->dev;
struct rockchip_drm_private *private = drm->dev_private;

+   if (!private)
+   return -ENODEV;
+
if (private->domain)
return rockchip_gem_alloc_iommu(rk_obj, alloc_kmap);
else
@@ -208,6 +211,11 @@ static void rockchip_gem_free_dma(struct 
rockchip_gem_object *rk_obj)

  static void rockchip_gem_free_buf(struct rockchip_gem_object *rk_obj)
  {
+   struct drm_device *drm = rk_obj->base.dev;
+
+   if (!drm->dev_private)
+   return;
+
if (rk_obj->pages)
rockchip_gem_free_iommu(rk_obj);
else
--
2.1.4








Re: [PATCH v3 8/9] drm/rockchip: gem: Don't alloc/free gem buf when dev_private is invalid

2017-04-05 Thread jeffy

Hi Sean,

On 04/06/2017 12:28 AM, Sean Paul wrote:

On Wed, Apr 05, 2017 at 04:29:26PM +0800, Jeffy Chen wrote:

After unbinding drm, the userspace may still has a chance to access
gem buf.

Add a sanity check for a NULL dev_private to prevent that from
happening.


I still don't understand how this is happening. You're saying that these hooks
can be called after rockchip_drm_unbind() has finished?

yes, tested on chromebook rk3399 kevin with kernel 4.4, if trigger 
unbind without killing display service(ui or frecon):


[   31.276889] [] dump_backtrace+0x0/0x164
[   31.282288] [] show_stack+0x24/0x30
[   31.287338] [] dump_stack+0x98/0xb8
[   31.292389] [] rockchip_gem_create_object+0x6c/0x2ec
[   31.298910] [] 
rockchip_gem_create_with_handle+0x38/0x10c

[   31.305868] [] rockchip_gem_create_ioctl+0x38/0x50
[   31.312221] [] drm_ioctl+0x2bc/0x438
[   31.317359] [] drm_compat_ioctl+0x3c/0x70
[   31.322935] [] compat_SyS_ioctl+0x134/0x1048
[   31.328766] [] __sys_trace_return+0x0/0x4


Sean



Signed-off-by: Jeffy Chen 
---

Changes in v3:
Address Daniel Vetter 's comments.
Update commit message.

Changes in v2: None

  drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index df9e570..205a3dc 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -184,6 +184,9 @@ static int rockchip_gem_alloc_buf(struct 
rockchip_gem_object *rk_obj,
struct drm_device *drm = obj->dev;
struct rockchip_drm_private *private = drm->dev_private;

+   if (!private)
+   return -ENODEV;
+
if (private->domain)
return rockchip_gem_alloc_iommu(rk_obj, alloc_kmap);
else
@@ -208,6 +211,11 @@ static void rockchip_gem_free_dma(struct 
rockchip_gem_object *rk_obj)

  static void rockchip_gem_free_buf(struct rockchip_gem_object *rk_obj)
  {
+   struct drm_device *drm = rk_obj->base.dev;
+
+   if (!drm->dev_private)
+   return;
+
if (rk_obj->pages)
rockchip_gem_free_iommu(rk_obj);
else
--
2.1.4








[lkp-robot] [debug] 19d436268d: BUG:unable_to_handle_kernel

2017-04-05 Thread kernel test robot

FYI, we noticed the following commit:

commit: 19d436268dde95389c616bb3819da73f0a8b28a8 ("debug: Add _ONCE() logic to 
report_bug()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| 70579a86e3 | 19d436268d |
++++
| boot_successes | 4  | 2  |
| boot_failures  | 4  | 6  |
| WARNING:at_mm/page_alloc.c:#__alloc_pages_nodemask | 4  ||
| BUG:unable_to_handle_kernel| 0  | 6  |
| Oops:#[##] | 0  | 6  |
| Kernel_panic-not_syncing:Fatal_exception   | 0  | 6  |
++++



[   82.299704] BUG: unable to handle kernel paging request at 81eec014
[   82.299719] IP: report_bug+0xff/0x1d0
[   82.299720] PGD 200a067 
[   82.299721] PUD 200b063 
[   82.299722] PMD 81e001e1 
[   82.299723] 
[   82.299725] Oops: 0003 [#1] PREEMPT SMP
[   82.299726] Modules linked in: crc32c_intel(+) intel_uncore(-) evbug 
serio_raw tpm_tis tpm_tis_core ide_pci_generic tpm parport_pc parport
[   82.299736] CPU: 0 PID: 177 Comm: systemd-udevd Not tainted 
4.11.0-rc3-00048-g19d4362 #2
[   82.299738] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[   82.299740] task: 88003f0fb840 task.stack: 88003c3a
[   82.299742] RIP: 0010:report_bug+0xff/0x1d0
[   82.299743] RSP: 0018:88003c3a37f8 EFLAGS: 00010202
[   82.299745] RAX: 0907 RBX: 88003c3a3948 RCX: 81eec00a
[   82.299746] RDX: 0001 RSI: 0e45 RDI: 0001
[   82.299747] RBP: 88003c3a3818 R08: 88003c3a37e0 R09: 88003c3a
[   82.299748] R10: 88003c3a4000 R11:  R12: 81209bfd
[   82.299749] R13: 81e34520 R14: 88003c3a3880 R15: 0004
[   82.299751] FS:  7f56d11168c0() GS:88003520() 
knlGS:
[   82.299752] CS:  0010 DS:  ES:  CR0: 80050033
[   82.299752] CR2: 81eec014 CR3: 3ea91000 CR4: 06b0
[   82.299756] Call Trace:
[   82.299764]  do_trap+0x21d/0x250
[   82.299767]  do_error_trap+0x9b/0x130
[   82.299772]  ? __alloc_pages_nodemask+0xc0d/0x1840
[   82.299776]  ? __this_cpu_preempt_check+0x1b/0x30
[   82.299781]  ? preempt_count_add+0x120/0x150
[   82.299785]  ? _raw_spin_unlock+0x2e/0x60
[   82.299790]  ? get_partial_node+0x1a4/0x400
[   82.299793]  do_invalid_op+0x28/0x40
[   82.299795]  invalid_op+0x1e/0x30
[   82.299798] RIP: 0010:__alloc_pages_nodemask+0xc0d/0x1840
[   82.299799] RSP: 0018:88003c3a39f0 EFLAGS: 00010202
[   82.299801] RAX: fffcad61 RBX:  RCX: 0017
[   82.299803] RDX:  RSI: 014040c0 RDI: 8800355cb000
[   82.299804] RBP: 88003c3a3b38 R08:  R09: ff800f8a
[   82.299805] R10: 0001 R11:  R12: 0001
[   82.299806] R13: 0017 R14: 82700ee0 R15: 0017
[   82.299809]  ? preempt_count_add+0x120/0x150
[   82.299814]  ? update_curr+0x8a/0x2b0
[   82.299816]  ? __dequeue_entity+0x2e/0x60
[   82.299819]  ? _raw_spin_unlock_irq+0x3d/0x70
[   82.299821]  ? finish_task_switch+0xc2/0x320
[   82.299825]  alloc_pages_current+0xd3/0x250
[   82.299829]  kmalloc_order_trace+0x36/0x190
[   82.299831]  __kmalloc_track_caller+0x2f1/0x3b0
[   82.299835]  kmemdup+0x30/0x70
[   82.299838]  gcov_info_dup+0x141/0x220
[   82.299839]  gcov_event+0x4bd/0x540
[   82.299841]  ? __might_sleep+0x62/0xc0
[   82.299844]  gcov_module_notifier+0x145/0x150
[   82.299846]  notifier_call_chain+0x62/0xa0
[   82.299849]  __blocking_notifier_call_chain+0x5f/0x90
[   82.299851]  blocking_notifier_call_chain+0x1e/0x30
[   82.299853]  do_init_module+0x2a2/0x640
[   82.299855]  load_module+0x1d81/0x2130
[   82.299857]  ? free_modinfo_version+0x40/0x40
[   82.299861]  ? kernel_read_file+0x26f/0x2a0
[   82.299864]  ? kernel_read_file_from_fd+0x61/0xc0
[   82.299866]  SyS_finit_module+0x108/0x120
[   82.299870]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[   82.299872] RIP: 0033:0x7f56cff91d49
[   82.299872] RSP: 002b:7ffc11a4f718 EFLAGS: 0206 ORIG_RAX: 
0139
[   82.299875] RAX: ffda RBX: 7f56d11061dd RCX: 7f56cff91d49
[   82.299877] RDX:  RSI: 7f56d08ad525 RDI: 0007
[   82.299878] RBP: 56441e9137c0 R08: 

[lkp-robot] [debug] 19d436268d: BUG:unable_to_handle_kernel

2017-04-05 Thread kernel test robot

FYI, we noticed the following commit:

commit: 19d436268dde95389c616bb3819da73f0a8b28a8 ("debug: Add _ONCE() logic to 
report_bug()")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire 
log/backtrace):


++++
|| 70579a86e3 | 19d436268d |
++++
| boot_successes | 4  | 2  |
| boot_failures  | 4  | 6  |
| WARNING:at_mm/page_alloc.c:#__alloc_pages_nodemask | 4  ||
| BUG:unable_to_handle_kernel| 0  | 6  |
| Oops:#[##] | 0  | 6  |
| Kernel_panic-not_syncing:Fatal_exception   | 0  | 6  |
++++



[   82.299704] BUG: unable to handle kernel paging request at 81eec014
[   82.299719] IP: report_bug+0xff/0x1d0
[   82.299720] PGD 200a067 
[   82.299721] PUD 200b063 
[   82.299722] PMD 81e001e1 
[   82.299723] 
[   82.299725] Oops: 0003 [#1] PREEMPT SMP
[   82.299726] Modules linked in: crc32c_intel(+) intel_uncore(-) evbug 
serio_raw tpm_tis tpm_tis_core ide_pci_generic tpm parport_pc parport
[   82.299736] CPU: 0 PID: 177 Comm: systemd-udevd Not tainted 
4.11.0-rc3-00048-g19d4362 #2
[   82.299738] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.9.3-20161025_171302-gandalf 04/01/2014
[   82.299740] task: 88003f0fb840 task.stack: 88003c3a
[   82.299742] RIP: 0010:report_bug+0xff/0x1d0
[   82.299743] RSP: 0018:88003c3a37f8 EFLAGS: 00010202
[   82.299745] RAX: 0907 RBX: 88003c3a3948 RCX: 81eec00a
[   82.299746] RDX: 0001 RSI: 0e45 RDI: 0001
[   82.299747] RBP: 88003c3a3818 R08: 88003c3a37e0 R09: 88003c3a
[   82.299748] R10: 88003c3a4000 R11:  R12: 81209bfd
[   82.299749] R13: 81e34520 R14: 88003c3a3880 R15: 0004
[   82.299751] FS:  7f56d11168c0() GS:88003520() 
knlGS:
[   82.299752] CS:  0010 DS:  ES:  CR0: 80050033
[   82.299752] CR2: 81eec014 CR3: 3ea91000 CR4: 06b0
[   82.299756] Call Trace:
[   82.299764]  do_trap+0x21d/0x250
[   82.299767]  do_error_trap+0x9b/0x130
[   82.299772]  ? __alloc_pages_nodemask+0xc0d/0x1840
[   82.299776]  ? __this_cpu_preempt_check+0x1b/0x30
[   82.299781]  ? preempt_count_add+0x120/0x150
[   82.299785]  ? _raw_spin_unlock+0x2e/0x60
[   82.299790]  ? get_partial_node+0x1a4/0x400
[   82.299793]  do_invalid_op+0x28/0x40
[   82.299795]  invalid_op+0x1e/0x30
[   82.299798] RIP: 0010:__alloc_pages_nodemask+0xc0d/0x1840
[   82.299799] RSP: 0018:88003c3a39f0 EFLAGS: 00010202
[   82.299801] RAX: fffcad61 RBX:  RCX: 0017
[   82.299803] RDX:  RSI: 014040c0 RDI: 8800355cb000
[   82.299804] RBP: 88003c3a3b38 R08:  R09: ff800f8a
[   82.299805] R10: 0001 R11:  R12: 0001
[   82.299806] R13: 0017 R14: 82700ee0 R15: 0017
[   82.299809]  ? preempt_count_add+0x120/0x150
[   82.299814]  ? update_curr+0x8a/0x2b0
[   82.299816]  ? __dequeue_entity+0x2e/0x60
[   82.299819]  ? _raw_spin_unlock_irq+0x3d/0x70
[   82.299821]  ? finish_task_switch+0xc2/0x320
[   82.299825]  alloc_pages_current+0xd3/0x250
[   82.299829]  kmalloc_order_trace+0x36/0x190
[   82.299831]  __kmalloc_track_caller+0x2f1/0x3b0
[   82.299835]  kmemdup+0x30/0x70
[   82.299838]  gcov_info_dup+0x141/0x220
[   82.299839]  gcov_event+0x4bd/0x540
[   82.299841]  ? __might_sleep+0x62/0xc0
[   82.299844]  gcov_module_notifier+0x145/0x150
[   82.299846]  notifier_call_chain+0x62/0xa0
[   82.299849]  __blocking_notifier_call_chain+0x5f/0x90
[   82.299851]  blocking_notifier_call_chain+0x1e/0x30
[   82.299853]  do_init_module+0x2a2/0x640
[   82.299855]  load_module+0x1d81/0x2130
[   82.299857]  ? free_modinfo_version+0x40/0x40
[   82.299861]  ? kernel_read_file+0x26f/0x2a0
[   82.299864]  ? kernel_read_file_from_fd+0x61/0xc0
[   82.299866]  SyS_finit_module+0x108/0x120
[   82.299870]  entry_SYSCALL_64_fastpath+0x1e/0xa9
[   82.299872] RIP: 0033:0x7f56cff91d49
[   82.299872] RSP: 002b:7ffc11a4f718 EFLAGS: 0206 ORIG_RAX: 
0139
[   82.299875] RAX: ffda RBX: 7f56d11061dd RCX: 7f56cff91d49
[   82.299877] RDX:  RSI: 7f56d08ad525 RDI: 0007
[   82.299878] RBP: 56441e9137c0 R08: 

  1   2   3   4   5   6   7   8   9   10   >