date:20190520

Re: [PATCH RESEND] kvm: make kvm_vcpu_(un)map dependency on CONFIG_HAS_IOMEM explicit

2019-05-20 Thread Michal Kubecek

On Mon, May 20, 2019 at 07:23:43PM +0200, Paolo Bonzini wrote:
> On 20/05/19 18:44, Michal Kubecek wrote:
> > Recently introduced functions kvm_vcpu_map() and kvm_vcpu_unmap() call
> > memremap() and memunmap() which are only available if HAS_IOMEM is enabled
> > but this dependency is not explicit, so that the build fails with HAS_IOMEM
> > disabled.
> > 
> > As both function are only used on x86 where HAS_IOMEM is always enabled,
> > the easiest fix seems to be to only provide them when HAS_IOMEM is enabled.
> > 
> > Fixes: e45adf665a53 ("KVM: Introduce a new guest mapping API")
> > Signed-off-by: Michal Kubecek 
> > ---
> 
> Thank you very much.  However, it's better if only the memremap part is
> hidden behind CONFIG_HAS_IOMEM.  I'll send a patch tomorrow and have it
> reach Linus at most on Wednesday.

That sounds like a better solution. As I'm not familiar with the code,
I didn't want to risk and suggested the easiest way around.

Michal

> There is actually nothing specific to CONFIG_HAS_IOMEM in them,
> basically the functionality we want is remap_pfn_range but without a
> VMA.  However, it's for a niche use case where KVM guest memory is
> mmap-ed from /dev/mem and it's okay if for now that part remains
> disabled on s390.
> 
> Paolo

Re: [PATCH 2/4] md: raid0: Remove return statement from void function

2019-05-20 Thread Song Liu

On Mon, May 20, 2019 at 2:45 PM Marcos Paulo de Souza
 wrote:
>
> This return statement was introduced in commit
> 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 ("Linux-2.6.12-rc2") and can be
> safely removed.

Wow, that's a really old commit. :)

I think 3/4 and 4/4 of the set makes git-blame more difficult to
follow. Let's not
apply them.

Thanks,
Song

>
> Signed-off-by: Marcos Paulo de Souza 
> ---
>  drivers/md/raid0.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index f3fb5bb8c82a..42b0287104bd 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -609,7 +609,6 @@ static bool raid0_make_request(struct mddev *mddev, 
> struct bio *bio)
>  static void raid0_status(struct seq_file *seq, struct mddev *mddev)
>  {
> seq_printf(seq, " %dk chunks", mddev->chunk_sectors / 2);
> -   return;
>  }
>
>  static void *raid0_takeover_raid45(struct mddev *mddev)
> --
> 2.21.0
>

Re: [PATCH RESEND] kvm: make kvm_vcpu_(un)map dependency on CONFIG_HAS_IOMEM explicit

2019-05-20 Thread Michal Kubecek

On Mon, May 20, 2019 at 03:45:29PM -0700, Bjorn Andersson wrote:
> On Mon, May 20, 2019 at 9:44 AM Michal Kubecek  wrote:
> >
> > Recently introduced functions kvm_vcpu_map() and kvm_vcpu_unmap() call
> > memremap() and memunmap() which are only available if HAS_IOMEM is enabled
> > but this dependency is not explicit, so that the build fails with HAS_IOMEM
> > disabled.
> >
> > As both function are only used on x86 where HAS_IOMEM is always enabled,
> > the easiest fix seems to be to only provide them when HAS_IOMEM is enabled.
> >
> > Fixes: e45adf665a53 ("KVM: Introduce a new guest mapping API")
> > Signed-off-by: Michal Kubecek 
> 
> Hi Michal,
> 
> I see the same build issue on arm64 and as CONFIG_HAS_IOMEM is set
> there this patch has no effect on solving that. Instead I had to
> include linux/io.h in kvm_main.c to make it compile.

This sounds like a different problem which was already resolved in
mainline by commit c011d23ba046 ("kvm: fix compilation on aarch64")
which is present in v5.2-rc1. The issue I'm trying to address is link
time failure (unresolved reference to memremap()/memunmap()) when
CONFIG_HAS_IOMEM is disabled (in our case it affects a special
minimalistic s390x config for zfcpdump).

Michal

Re: linux-next: Tree for May 21

2019-05-20 Thread Masahiro Yamada

Hi Stephen, Andrew,


On Tue, May 21, 2019 at 2:15 PM Stephen Rothwell  wrote:
>
> Hi all,

FYI.
Commit 15e57a12d4df3c662f6cceaec6d1efa98a3d70f8
is equivalent to commit ecebc5ce59a003163eb608ace38a01d7ffeb0a95
which is already in the mainline.

The former should be dropped, shouldn't it?

Thanks.



>
> Changes since 20190520:
>
> New trees: soc-fsl, soc-fsl-fixes
>
> Removed trees: (not updated for more than a year)
> alpine, samsung, sh, befs, kconfig, dwmw2-iommu, trivial,
> target-updates, target-bva, init_task
>
> The imx-mxs tree gained a build failure so I used the version from
> next-20190520.
>
> The sunxi tree gained a conflict against the imx-mxs tree.
>
> The drm-misc tree gained conflicts against Linis' and the amdgpu trees.
>
> Non-merge commits (relative to Linus' tree): 991
>  998 files changed, 29912 insertions(+), 14691 deletions(-)
>
> 
>
> I have created today's linux-next tree at
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> (patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
> are tracking the linux-next tree using git, you should not use "git pull"
> to do so as that will try to merge the new linux-next release with the
> old one.  You should use "git fetch" and checkout or reset to the new
> master.
>
> You can see which trees have been included by looking in the Next/Trees
> file in the source.  There are also quilt-import.log and merge.log
> files in the Next directory.  Between each merge, the tree was built
> with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
> multi_v7_defconfig for arm and a native build of tools/perf. After
> the final fixups (if any), I do an x86_64 modules_install followed by
> builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
> ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
> and sparc64 defconfig. And finally, a simple boot test of the powerpc
> pseries_le_defconfig kernel in qemu (with and without kvm enabled).
>
> Below is a summary of the state of the merge.
>
> I am currently merging 290 trees (counting Linus' and 70 trees of bug
> fix patches pending for the current merge release).
>
> Stats about the size of the tree over time can be seen at
> http://neuling.org/linux-next-size.html .
>
> Status of my local build tests will be at
> http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
> advice about cross compilers/configs that work, we are always open to add
> more builds.
>
> Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
> Gortmaker for triage and bug fixes.
>
> --
> Cheers,
> Stephen Rothwell
>
> $ git checkout master
> $ git reset --hard stable
> Merging origin/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux)
> Merging fixes/master (2bbacd1a9278 Merge tag 'kconfig-v5.2' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
> Merging kspp-gustavo/for-next/kspp (b324f1b28dc0 afs: yfsclient: Mark 
> expected switch fall-throughs)
> Merging kbuild-current/fixes (a2d635decbfa Merge tag 'drm-next-2019-05-09' of 
> git://anongit.freedesktop.org/drm/drm)
> Merging arc-current/for-curr (c5a1726d7383 ARC: entry: EV_Trap expects r10 
> (vs. r9) to have exception cause)
> Merging arm-current/fixes (e17b1af96b2a ARM: 8857/1: efi: enable CP15 DMB 
> instructions before cleaning the cache)
> Merging arm64-fixes/for-next/fixes (7a0a93c51799 arm64: vdso: Explicitly add 
> build-id option)
> Merging m68k-current/for-linus (fdd20ec8786a Documentation/features/time: 
> Mark m68k having modern-timekeeping)
> Merging powerpc-fixes/fixes (672eaf37db9f powerpc/cacheinfo: Remove double 
> free)
> Merging sparc/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux)
> Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
> Merging net/master (fa2c52be7129 vlan: Mark expected switch fall-through)
> Merging bpf/master (6a0a923dfa14 of_net: fix of_get_mac_address retval if 
> compiled without CONFIG_OF)
> Merging ipsec/master (9b3040a6aafd ipv4: Define __ipv4_neigh_lookup_noref 
> when CONFIG_INET is disabled)
> Merging netfilter/master (2c82c7e724ff netfilter: nf_tables: fix oops during 
> rule dump)
> Merging ipvs/master (b2e3d68d1251 netfilter: nft_compat: destroy function 
> must not have side effects)
> Merging wireless-drivers/master (7a0f8ad5ff63 Merge ath-current from 
> git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git)
> Merging mac80211/master (933b40530b4b ma

RE: [PATCH v4] clk: qoriq: add support for lx2160a

2019-05-20 Thread Vabhav Sharma

Hello Stephen, 
I have incorporated review comments from 
https://patchwork.kernel.org/patch/10917171/

A gentle reminder to apply the patch 
https://patchwork.kernel.org/patch/10918407/.

Regards,
Vabhav

> -Original Message-
> From: Vabhav Sharma 
> Sent: Friday, April 26, 2019 12:24 PM
> To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org
> Cc: sb...@kernel.org; mturque...@baylibre.com; Vabhav Sharma
> ; Andy Tang ; Yogesh
> Narayan Gaur 
> Subject: [PATCH v4] clk: qoriq: add support for lx2160a
> 
> Add clockgen support and configuration for NXP SoC lx2160a with
> compatible property as "fsl,lx2160a-clockgen".
> 
> Signed-off-by: Tang Yuantian 
> Signed-off-by: Yogesh Gaur 
> Signed-off-by: Vabhav Sharma 
> Acked-by: Scott Wood 
> Acked-by: Stephen Boyd 
> Acked-by: Viresh Kumar 
> ---
> Changes for v4:
> - Incorporated review comments from Stephen Boyd
> 
> Changes for v3:
> - Incorporated review comments of Rafael J. Wysocki
> - Updated commit message
> 
> Changes for v2:
> - Subject line updated
> 
>  drivers/clk/clk-qoriq.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c index
> 3d51d7c..1a15201 100644
> --- a/drivers/clk/clk-qoriq.c
> +++ b/drivers/clk/clk-qoriq.c
> @@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = {
>   .flags = CG_VER3 | CG_LITTLE_ENDIAN,
>   },
>   {
> + .compat = "fsl,lx2160a-clockgen",
> + .cmux_groups = {
> + _cmux_cga12, _cmux_cgb
> + },
> + .cmux_to_group = {
> + 0, 0, 0, 0, 1, 1, 1, 1, -1
> + },
> + .pll_mask = 0x37,
> + .flags = CG_VER3 | CG_LITTLE_ENDIAN,
> + },
> + {
>   .compat = "fsl,p2041-clockgen",
>   .guts_compat = "fsl,qoriq-device-config-1.0",
>   .init_periph = p2041_init_periph,
> @@ -1427,6 +1438,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a,
> "fsl,ls1043a-clockgen", clockgen_init);
> CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen",
> clockgen_init);  CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a-
> clockgen", clockgen_init);  CLK_OF_DECLARE(qoriq_clockgen_ls2080a,
> "fsl,ls2080a-clockgen", clockgen_init);
> +CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen",
> +clockgen_init);
>  CLK_OF_DECLARE(qoriq_clockgen_p2041, "fsl,p2041-clockgen", clockgen_init);
> CLK_OF_DECLARE(qoriq_clockgen_p3041, "fsl,p3041-clockgen", clockgen_init);
> CLK_OF_DECLARE(qoriq_clockgen_p4080, "fsl,p4080-clockgen", clockgen_init);
> --
> 2.7.4

RE: Re: Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm

2019-05-20 Thread Robin Gong

> -Original Message-
> From: Vinod Koul 
> Sent: 2019年5月21日 13:13
> 
> On 21-05-19, 04:58, Robin Gong wrote:
> > > -Original Message-
> > > From: Vinod Koul 
> > > Sent: 2019年5月21日 12:18
> > >
> > > On 07-05-19, 09:16, Robin Gong wrote:
> > > > Because the number of ecspi1 rx event on i.mx8mm is 0, the
> > > > condition check ignore such special case without dma channel
> > > > enabled, which caused
> > > > ecspi1 rx works failed. Actually, no need to check event_id0,
> > > > checking
> > > > event_id1 is enough for DEV_2_DEV case because it's so lucky that
> > > > event_id1 never be 0.
> > >
> > > Well is that by chance or design that event_id1 will be never 0?
> > >
> > That's by chance. DEV_2_DEV is just for Audio case and non-zero for
> event_id1 on current i.MX family.
> 
> Then it wont be fgood to rely on chance :)
Yes, I knew that. May I create another independent patch for event_id1 since 
that's potential issue is not related with this ecspi patch set?
> 
> --
> ~Vinod

Re: [PATCHv1 4/8] arm64: dts: qcom: msm8916: Use more generic idle state names

2019-05-20 Thread Amit Kucheria

On Wed, May 15, 2019 at 6:33 PM Niklas Cassel  wrote:
>
> On Wed, May 15, 2019 at 03:43:19PM +0530, Amit Kucheria wrote:
> > On Tue, May 14, 2019 at 9:42 PM Niklas Cassel  
> > wrote:
> > >
> > > On Fri, May 10, 2019 at 04:59:42PM +0530, Amit Kucheria wrote:
> > > > Instead of using Qualcomm-specific terminology, use generic node names
> > > > for the idle states that are easier to understand. Move the description
> > > > into the "idle-state-name" property.
> > > >
> > > > Signed-off-by: Amit Kucheria 
> > > > ---
> > > >  arch/arm64/boot/dts/qcom/msm8916.dtsi | 11 ++-
> > > >  1 file changed, 6 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi 
> > > > b/arch/arm64/boot/dts/qcom/msm8916.dtsi
> > > > index ded1052e5693..400b609bb3fd 100644
> > > > --- a/arch/arm64/boot/dts/qcom/msm8916.dtsi
> > > > +++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi
> > > > @@ -110,7 +110,7 @@
> > > >   reg = <0x0>;
> > > >   next-level-cache = <_0>;
> > > >   enable-method = "psci";
> > > > - cpu-idle-states = <_SPC>;
> > > > + cpu-idle-states = <_SLEEP_0>;
> > > >   clocks = <>;
> > > >   operating-points-v2 = <_opp_table>;
> > > >   #cooling-cells = <2>;
> > > > @@ -122,7 +122,7 @@
> > > >   reg = <0x1>;
> > > >   next-level-cache = <_0>;
> > > >   enable-method = "psci";
> > > > - cpu-idle-states = <_SPC>;
> > > > + cpu-idle-states = <_SLEEP_0>;
> > > >   clocks = <>;
> > > >   operating-points-v2 = <_opp_table>;
> > > >   #cooling-cells = <2>;
> > > > @@ -134,7 +134,7 @@
> > > >   reg = <0x2>;
> > > >   next-level-cache = <_0>;
> > > >   enable-method = "psci";
> > > > - cpu-idle-states = <_SPC>;
> > > > + cpu-idle-states = <_SLEEP_0>;
> > > >   clocks = <>;
> > > >   operating-points-v2 = <_opp_table>;
> > > >   #cooling-cells = <2>;
> > > > @@ -146,7 +146,7 @@
> > > >   reg = <0x3>;
> > > >   next-level-cache = <_0>;
> > > >   enable-method = "psci";
> > > > - cpu-idle-states = <_SPC>;
> > > > + cpu-idle-states = <_SLEEP_0>;
> > > >   clocks = <>;
> > > >   operating-points-v2 = <_opp_table>;
> > > >   #cooling-cells = <2>;
> > > > @@ -160,8 +160,9 @@
> > > >   idle-states {
> > > >   entry-method="psci";
> > >
> > > Please add a space before and after "=".
> > >
> > > >
> > > > - CPU_SPC: spc {
> > > > + CPU_SLEEP_0: cpu-sleep-0 {
> > >
> > > While I like your idea of using power state names from
> > > Server Base System Architecture document (SBSA) where applicable,
> > > does each qcom power state have a matching state in SBSA?
> > >
> > > These are the qcom power states:
> > > https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/Documentation/devicetree/bindings/arm/msm/lpm-levels.txt?h=msm-4.4#n53
> > >
> > > Note that qcom defines:
> > > "wfi", "retention", "gdhs", "pc", "fpc"
> > > while SBSA simply defines "idle_standby" (aka wfi), "idle_retention", 
> > > "sleep".
> > >
> > > Unless you know the equivalent name for each qcom power state
> > > (perhaps several qcom power states are really the same SBSA state?),
> > > I think that you should omit the renaming from this patch series.
> >
> > That is what SLEEP_0, SLEEP_1, SLEEP_2 could be used for.
>
> Ok, sounds good to me.
>
> >
> > IOW, all these qcom definitions are nicely represented in the
> > state-name and we could simply stick to SLEEP_0, SLEEP_1 for the node
> > names. There is wide variability in the the names of the qcom idle
> > states across SoC families downstream, so I'd argue against using
> > those for the node names.
> >
> > Just for cpu states (non-wfi) I see the use of the following names
> > downstream across families. The C seems to come from x86
> > world[1]:
> >
> >  - C4,   standalone power collapse (spc)
> >  - C4,   power collapse (fpc)
> >  - C2D, retention
> >  - C3,   power collapse (pc)
> >  - C4,   rail power collapse (rail-pc)
> >
> > [1] 
> > https://www.hardwaresecrets.com/everything-you-need-to-know-about-the-cpu-c-states-power-saving-modes/
>
> Indeed, there seems to be mixed names used, I've also seen "fpc-def".
>
> So, you have convinced me.
>
>
> Kind regards,
> Niklas

Can I take that as a Reviewed-by?

linux-next: Signed-off-by missing for commit in the amlogic tree

2019-05-20 Thread Stephen Rothwell

Hi all,

Commit

  5d32a77c6e2e ("arm64: dts: meson-g12a: Add PWM nodes")

is missing a Signed-off-by from its committer.

-- 
Cheers,
Stephen Rothwell


pgpBisrW4v1Uo.pgp
Description: OpenPGP digital signature

Re: [RFC PATCH v2 0/4] Input: mpr121-polled: Add polled driver for MPR121

2019-05-20 Thread Dmitry Torokhov

Hi Michal,

On Fri, May 17, 2019 at 03:12:49PM +0200, Michal Vokáč wrote:
> Hi,
> 
> I have to deal with a situation where we have a custom i.MX6 based
> platform in production that uses the MPR121 touchkey controller.
> Unfortunately the chip is connected using only the I2C interface.
> The interrupt line is not used. Back in 2015 (Linux v3.14), my
> colleague modded the existing mpr121_touchkey.c driver to use polling
> instead of interrupt.
> 
> For quite some time yet I am in a process of updating the product from
> the ancient Freescale v3.14 kernel to the latest mainline and pushing
> any needed changes upstream. The DT files for our imx6dl-yapp4 platform
> already made it into v5.1-rc.
> 
> I rebased and updated our mpr121 patch to the latest mainline.
> It is created as a separate driver, similarly to gpio_keys_polled.
> 
> The I2C device is quite susceptible to ESD. An ESD test quite often
> causes reset of the chip or some register randomly changes its value.
> The [PATCH 3/4] adds a write-through register cache. With the cache
> this state can be detected and the device can be re-initialied.
> 
> The main question is: Is there any chance that such a polled driver
> could be accepted? Is it correct to implement it as a separate driver
> or should it be done as an option in the existing driver? I can not
> really imagine how I would do that though..
> 
> There are also certain worries that the MPR121 chip may no longer be
> available in nonspecifically distant future. In case of EOL I will need
> to add a polled driver for an other touchkey chip. May it be already
> in mainline or a completely new one.

I think that my addition of input_polled_dev was ultimately a wrong
thing to do. I am looking into enabling polling mode for regular input
devices as we then can enable polling mode in existing drivers.

As far as gpio-keys vs gpio-key-polled, I feel that the capabilities of
polling driver is sufficiently different from interrupt-driven one, so
we will likely keep them separate.

Thanks.

-- 
Dmitry

Re: [PATCH 1/2] Input: atmel_mxt_ts - add wakeup support

2019-05-20 Thread Dmitry Torokhov

On Sat, May 18, 2019 at 06:55:10PM +0200, stefano.ma...@gmail.com wrote:
> Hi Dmitry,
> 
> On Fri, 2019-05-17 at 14:30 -0700, Dmitry Torokhov wrote:
> > Hi Sefano,
> > 
> > On Fri, May 17, 2019 at 11:17:40PM +0200, Stefano Manni wrote:
> > > Add wakeup support to the maxtouch driver.
> > > The device can wake up the system from suspend,
> > > mark the IRQ as wakeup capable, so that device
> > > irq is not disabled during system suspend.
> > 
> > This should already be handled by I2C core, see lines after "if
> > (client->flags & I2C_CLIENT_WAKE)" in drivers/i2c/i2c-core-base.c.
> > 
> > Unless there is dedicated wakeup interrupt we configure main
> > interrupt
> > as wake source.
> > 
> 
> what's about the other drivers (e.g. ili210x.c) doing like this?
> Shall they be purged?

They were likely done before I2C and driver core were enhanced to handle
wakeup automatically. We might want to clean them up, as long as we
verify that they keep working.

Thanks.

-- 
Dmitry

Re: [PATCH v2] edac: sifive: Add EDAC platform driver for SiFive SoCs

2019-05-20 Thread Yash Shah

On Mon, May 6, 2019 at 4:57 PM Yash Shah  wrote:
>
> The initial ver of EDAC driver supports:
> - ECC event monitoring and reporting through the EDAC framework for SiFive
>   L2 cache controller.
>
> The EDAC driver registers for notifier events from the L2 cache controller
> driver (arch/riscv/mm/sifive_l2_cache.c) for L2 ECC events
>
> Signed-off-by: Yash Shah 
> Reviewed-by: James Morse 
> ---
> This patch depends on patch
> 'RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs'
> https://lkml.org/lkml/2019/5/6/255

The prerequisite patch (sifive_l2_cache driver) has been merged into
mainline v5.2-rc1
It should be OK to merge this edac driver now.

- Yash

Re: [PATCH 0/5] firmware: Add support for loading compressed files

2019-05-20 Thread Takashi Iwai

On Mon, 20 May 2019 11:56:07 +0200,
Takashi Iwai wrote:
> 
> On Mon, 20 May 2019 11:39:29 +0200,
> Greg Kroah-Hartman wrote:
> > 
> > On Mon, May 20, 2019 at 11:26:42AM +0200, Takashi Iwai wrote:
> > > Hi,
> > > 
> > > this is a patch set to add the support for loading compressed firmware
> > > files.
> > > 
> > > The primary motivation is to reduce the storage size; e.g. currently
> > > the amount of /lib/firmware on my machine counts up to 419MB, and this
> > > can be reduced to 130MB file compression.  No bad deal.
> > > 
> > > The feature adds only fallback to the compressed file, so it should
> > > work as it was as long as the normal firmware file is present.  The
> > > f/w loader decompresses the content, so that there is no change needed
> > > in the caller side.
> > > 
> > > Currently only XZ format is supported.  A caveat is that the kernel XZ
> > > helper code supports only CRC32 (or none) integrity check type, so
> > > you'll have to compress the files via xz -C crc32 option.
> > > 
> > > The patch set begins with a few other improvements and refactoring,
> > > followed by the compression support.
> > > 
> > > In addition to this, dracut needs a small fix to deal with the *.xz
> > > files.
> > > 
> > > Also, the latest patchset is found in topic/fw-decompress branch of my
> > > sound.git tree:
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git
> > 
> > After a quick review, these all look good to me, nice job.
> > 
> > One recommendation, can we add support for testing this to the
> > tools/testing/selftests/firmware/ tests?  And you did run those
> > regression tests to verify that you didn't get any of the config options
> > messed up, right? :)
> 
> Oh, do you believe I'm a so modern person who lets computer working on
> everything? ;)  I only tested manually, so far, this will be my
> homework today.

After fixing the regression in kselftest, I could verify and confirm
that no regression was introduced by my patchset.

Also, below is the patch to add tests for the compressed firmware
load.  I'll add to the series at the next respin, if needed.


thanks,

Takashi

-- 8< --
From: Takashi Iwai 
Subject: [PATCH] selftests: firmware: Add compressed firmware tests

This patch adds the test cases for checking compressed firmware load.
Two more cases are added to fw_filesystem.sh:
- Both a plain file and an xz file are present, and load the former
- Only an xz file is present, and load without '.xz' suffix

The tests are enabled only when CONFIG_FW_LOADER_COMPRESS is enabled
and xz program is installed.

Signed-off-by: Takashi Iwai 
---
 tools/testing/selftests/firmware/fw_filesystem.sh | 73 +++
 tools/testing/selftests/firmware/fw_lib.sh|  7 +++
 tools/testing/selftests/firmware/fw_run_tests.sh  |  1 +
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/firmware/fw_filesystem.sh 
b/tools/testing/selftests/firmware/fw_filesystem.sh
index a4320c4b44dc..f901076aa2ea 100755
--- a/tools/testing/selftests/firmware/fw_filesystem.sh
+++ b/tools/testing/selftests/firmware/fw_filesystem.sh
@@ -153,13 +153,18 @@ config_set_read_fw_idx()
 
 read_firmwares()
 {
+   if [ "$1" = "xzonly" ]; then
+   fwfile="${FW}-orig"
+   else
+   fwfile="$FW"
+   fi
for i in $(seq 0 3); do
config_set_read_fw_idx $i
# Verify the contents are what we expect.
# -Z required for now -- check for yourself, md5sum
# on $FW and DIR/read_firmware will yield the same. Even
# cmp agrees, so something is off.
-   if ! diff -q -Z "$FW" $DIR/read_firmware 2>/dev/null ; then
+   if ! diff -q -Z "$fwfile" $DIR/read_firmware 2>/dev/null ; then
echo "request #$i: firmware was not loaded" >&2
exit 1
fi
@@ -246,17 +251,17 @@ test_request_firmware_nowait_custom_nofile()
 
 test_batched_request_firmware()
 {
-   echo -n "Batched request_firmware() try #$1: "
+   echo -n "Batched request_firmware() $2 try #$1: "
config_reset
config_trigger_sync
-   read_firmwares
+   read_firmwares $2
release_all_firmware
echo "OK"
 }
 
 test_batched_request_firmware_direct()
 {
-   echo -n "Batched request_firmware_direct() try #$1: "
+   echo -n "Batched request_firmware_direct() $2 try #$1: "
config_reset
config_set_sync_direct
config_trigger_sync
@@ -266,7 +271,7 @@ test_batched_request_firmware_direct()
 
 test_request_firmware_nowait_uevent()
 {
-   echo -n "Batched request_firmware_nowait(uevent=true) try #$1: "
+   echo -n "Batched request_firmware_nowait(uevent=true) $2 try #$1: "
config_reset
config_trigger_async
release_all_firmware
@@ -275,11 +280,16 @@ test_request_firmware_nowait_uevent()
 
 test_request_firmware_nowait_custom()
 {
-   echo -n

Re: [RESEND] input: keyboard: imx: make sure keyboard can always wake up system

2019-05-20 Thread dmitry.torok...@gmail.com

Hi Anson,
On Thu, Apr 04, 2019 at 01:40:16AM +, Anson Huang wrote:
> There are several scenarios that keyboard can NOT wake up system
> from suspend, e.g., if a keyboard is depressed between system
> device suspend phase and device noirq suspend phase, the keyboard
> ISR will be called and both keyboard depress and release interrupts
> will be disabled, then keyboard will no longer be able to wake up
> system. Another scenario would be, if a keyboard is kept depressed,
> and then system goes into suspend, the expected behavior would be
> when keyboard is released, system will be waked up, but current
> implementation can NOT achieve that, because both depress and release
> interrupts are disabled in ISR, and the event check is still in
> progress.
> 
> To fix these issues, need to make sure keyboard's depress or release
> interrupt is enabled after noirq device suspend phase, this patch
> moves the suspend/resume callback to noirq suspend/resume phase, and
> enable the corresponding interrupt according to current keyboard status.

I believe it is possible for IRQ to be disabled and still  being enabled
as wakeup source. What happens if you call disable_irq() before
disabling the clock?

Thanks.

-- 
Dmitry

Re: [PATCH 1/2] selftests: Remove forced unbuffering for test running

2019-05-20 Thread Takashi Iwai

On Tue, 21 May 2019 00:37:48 +0200,
Kees Cook wrote:
> 
> As it turns out, the "stdbuf" command will actually force all
> subprocesses into unbuffered output, and some implementations of "echo"
> turn into single-character writes, which utterly wrecks writes to /sys
> and /proc files.
> 
> Instead, drop the "stdbuf" usage, and for any tests that want explicit
> flushing between newlines, they'll have to add "fflush(stdout);" as
> needed.
> 
> Reported-by: Takashi Iwai 
> Fixes: 5c069b6dedef ("selftests: Move test output to diagnostic lines")
> Signed-off-by: Kees Cook 

Tested-by: Takashi Iwai 

BTW, this might be specific to shell invocation.  As in the original
discussion thread, it starts working when I replace "echo" with
"/usr/bin/echo".

Still it's not easy to control in a script itself, so dropping the
unbuffered mode is certainly safer, yes.

Thanks!


Takashi


> ---
>  tools/testing/selftests/kselftest/runner.sh | 12 +---
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> diff --git a/tools/testing/selftests/kselftest/runner.sh 
> b/tools/testing/selftests/kselftest/runner.sh
> index eff3ee303d0d..00c9020bdda8 100644
> --- a/tools/testing/selftests/kselftest/runner.sh
> +++ b/tools/testing/selftests/kselftest/runner.sh
> @@ -24,16 +24,6 @@ tap_prefix()
>   fi
>  }
>  
> -# If stdbuf is unavailable, we must fall back to line-at-a-time piping.
> -tap_unbuffer()
> -{
> - if ! which stdbuf >/dev/null ; then
> - "$@"
> - else
> - stdbuf -i0 -o0 -e0 "$@"
> - fi
> -}
> -
>  run_one()
>  {
>   DIR="$1"
> @@ -54,7 +44,7 @@ run_one()
>   echo "not ok $test_num $TEST_HDR_MSG"
>   else
>   cd `dirname $TEST` > /dev/null
> - (tap_unbuffer ./$BASENAME_TEST 2>&1; echo $? >&3) |
> + (./$BASENAME_TEST 2>&1; echo $? >&3) |
>   tap_prefix >&4) 3>&1) |
>   (read xs; exit $xs)) 4>>"$logfile" &&
>   echo "ok $test_num $TEST_HDR_MSG") ||
> -- 
> 2.17.1
>

Re: [PATCH v2 1/9] media: ov6650: Fix MODDULE_DESCRIPTION

2019-05-20 Thread Greg KH

On Tue, May 21, 2019 at 12:49:59AM +0200, Janusz Krzysztofik wrote:
> Commit 23a52386fabe ("media: ov6650: convert to standalone v4l2
> subdevice") converted the driver from a soc_camera sensor to a
> standalone V4L subdevice driver.  Unfortunately, module description was
> not updated to reflect the change.  Fix it.
> 
> While being at it, update email address of the module author.
> 
> Fixes: 23a52386fabe ("media: ov6650: convert to standalone v4l2 subdevice")
> Signed-off-by: Janusz Krzysztofik 
> cc: sta...@vger.kernel.org
> ---
>  drivers/media/i2c/ov6650.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/i2c/ov6650.c b/drivers/media/i2c/ov6650.c
> index 1b972e591b48..a3d00afcb0c8 100644
> --- a/drivers/media/i2c/ov6650.c
> +++ b/drivers/media/i2c/ov6650.c
> @@ -1045,6 +1045,6 @@ static struct i2c_driver ov6650_i2c_driver = {
>  
>  module_i2c_driver(ov6650_i2c_driver);
>  
> -MODULE_DESCRIPTION("SoC Camera driver for OmniVision OV6650");
> -MODULE_AUTHOR("Janusz Krzysztofik ");
> +MODULE_DESCRIPTION("V4L2 subdevice driver for OmniVision OV6650 camera 
> sensor");
> +MODULE_AUTHOR("Janusz Krzysztofik   MODULE_LICENSE("GPL v2");
> -- 
> 2.21.0
> 

is this _really_ a patch that meets the stable kernel requirements?
Same for this whole series...

thanks,

greg k-h

Re: [PATCH 11/12] powerpc/pseries/svm: Force SWIOTLB for secure guests

2019-05-20 Thread Christoph Hellwig

> diff --git a/arch/powerpc/include/asm/mem_encrypt.h 
> b/arch/powerpc/include/asm/mem_encrypt.h
> new file mode 100644
> index ..45d5e4d0e6e0
> --- /dev/null
> +++ b/arch/powerpc/include/asm/mem_encrypt.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +/*
> + * SVM helper functions
> + *
> + * Copyright 2019 IBM Corporation
> + */
> +
> +#ifndef _ASM_POWERPC_MEM_ENCRYPT_H
> +#define _ASM_POWERPC_MEM_ENCRYPT_H
> +
> +#define sme_me_mask  0ULL
> +
> +static inline bool sme_active(void) { return false; }
> +static inline bool sev_active(void) { return false; }
> +
> +int set_memory_encrypted(unsigned long addr, int numpages);
> +int set_memory_decrypted(unsigned long addr, int numpages);
> +
> +#endif /* _ASM_POWERPC_MEM_ENCRYPT_H */

S/390 seems to be adding a stub header just like this.  Can you please
clean up the Kconfig and generic headers bits for memory encryption so
that we don't need all this boilerplate code?

>  config PPC_SVM
>   bool "Secure virtual machine (SVM) support for POWER"
>   depends on PPC_PSERIES
> + select SWIOTLB
> + select ARCH_HAS_MEM_ENCRYPT
>   default n

n is the default default, no need to explictly specify it.

Re: [PATCH] input: imx6ul_tsc: use devm_platform_ioremap_resource() to simplify code

2019-05-20 Thread dmitry.torok...@gmail.com

On Mon, Apr 01, 2019 at 05:19:55AM +, Anson Huang wrote:
> Use the new helper devm_platform_ioremap_resource() which wraps the
> platform_get_resource() and devm_ioremap_resource() together, to
> simplify the code.
> 
> Signed-off-by: Anson Huang 

Applied, thank you.

> ---
>  drivers/input/touchscreen/imx6ul_tsc.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/imx6ul_tsc.c 
> b/drivers/input/touchscreen/imx6ul_tsc.c
> index c10fc59..e04eecd 100644
> --- a/drivers/input/touchscreen/imx6ul_tsc.c
> +++ b/drivers/input/touchscreen/imx6ul_tsc.c
> @@ -364,8 +364,6 @@ static int imx6ul_tsc_probe(struct platform_device *pdev)
>   struct device_node *np = pdev->dev.of_node;
>   struct imx6ul_tsc *tsc;
>   struct input_dev *input_dev;
> - struct resource *tsc_mem;
> - struct resource *adc_mem;
>   int err;
>   int tsc_irq;
>   int adc_irq;
> @@ -403,16 +401,14 @@ static int imx6ul_tsc_probe(struct platform_device 
> *pdev)
>   return err;
>   }
>  
> - tsc_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> - tsc->tsc_regs = devm_ioremap_resource(>dev, tsc_mem);
> + tsc->tsc_regs = devm_platform_ioremap_resource(pdev, 0);
>   if (IS_ERR(tsc->tsc_regs)) {
>   err = PTR_ERR(tsc->tsc_regs);
>   dev_err(>dev, "failed to remap tsc memory: %d\n", err);
>   return err;
>   }
>  
> - adc_mem = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> - tsc->adc_regs = devm_ioremap_resource(>dev, adc_mem);
> + tsc->adc_regs = devm_platform_ioremap_resource(pdev, 1);
>   if (IS_ERR(tsc->adc_regs)) {
>   err = PTR_ERR(tsc->adc_regs);
>   dev_err(>dev, "failed to remap adc memory: %d\n", err);
> -- 
> 2.7.4
> 

-- 
Dmitry

Re: [RFC PATCH 02/12] powerpc: Add support for adding an ESM blob to the zImage wrapper

2019-05-20 Thread Christoph Hellwig

On Tue, May 21, 2019 at 01:49:02AM -0300, Thiago Jung Bauermann wrote:
> From: Benjamin Herrenschmidt 
> 
> For secure VMs, the signing tool will create a ticket called the "ESM blob"
> for the Enter Secure Mode ultravisor call with the signatures of the kernel
> and initrd among other things.
> 
> This adds support to the wrapper script for adding that blob via the "-e"
> option to the zImage.pseries.
> 
> It also adds code to the zImage wrapper itself to retrieve and if necessary
> relocate the blob, and pass its address to Linux via the device-tree, to be
> later consumed by prom_init.

Where does the "BLOB" come from?  How is it licensed and how can we
satisfy the GPL with it?

Re: [RFC 1/1] Add dm verity root hash pkcs7 sig validation.

2019-05-20 Thread Singh, Balbir




On 5/21/19 7:54 AM, Jaskaran Khurana wrote:
> Adds in-kernel pkcs7 signature checking for the roothash of
> the dm-verity hash tree.
> 
> The verification is to support cases where the roothash is not secured by
> Trusted Boot, UEFI Secureboot or similar technologies.
> One of the use cases for this is for dm-verity volumes mounted after boot,
> the root hash provided during the creation of the dm-verity volume has to
> be secure and thus in-kernel validation implemented here will be used
> before we trust the root hash and allow the block device to be created.
> 
The first patch was your cover letter, I'd suggest name it that way in
the subject.

> The signature being provided for verification must verify the root hash and 
> must be trusted by the builtin keyring for verification to succeed.
> 
> Adds DM_VERITY_VERIFY_ROOTHASH_SIG: roothash verification
> against the roothash signature file *if* specified, if signature file is
> specified verification must succeed prior to creation of device mapper 
> block device.
> 
> Adds DM_VERITY_VERIFY_ROOTHASH_SIG_FORCE: roothash signature *must* be
> specified for all dm verity volumes and verification must succeed prior
> to creation of device mapper block device.
> 
> Signed-off-by: Jaskaran Khurana 
> ---
>  drivers/md/Kconfig|  23 ++
>  drivers/md/Makefile   |   2 +-
>  drivers/md/dm-verity-target.c |  44 --
>  drivers/md/dm-verity-verify-sig.c | 129 ++
>  drivers/md/dm-verity-verify-sig.h |  32 
>  5 files changed, 222 insertions(+), 8 deletions(-)
>  create mode 100644 drivers/md/dm-verity-verify-sig.c
>  create mode 100644 drivers/md/dm-verity-verify-sig.h
> 
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index db269a348b20..da4115753f25 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -489,6 +489,29 @@ config DM_VERITY
>  
> If unsure, say N.
>  
> +config DM_VERITY_VERIFY_ROOTHASH_SIG
> + def_bool n
> + bool "Verity data device root hash signature verification support"
> + depends on DM_VERITY
> + select SYSTEM_DATA_VERIFICATION
> +   help
> +   The device mapper target created by DM-VERITY can be validated if the
> +   pre-generated tree of cryptographic checksums passed has a pkcs#7
> +   signature file that can validate the roothash of the tree.
> +
> +   If unsure, say N.
> +
> +config DM_VERITY_VERIFY_ROOTHASH_SIG_FORCE
> + def_bool n
> + bool "Forces all dm verity data device root hash should be signed"
> + depends on DM_VERITY_VERIFY_ROOTHASH_SIG
> +   help
> +   The device mapper target created by DM-VERITY will succeed only if the
> +   pre-generated tree of cryptographic checksums passed also has a pkcs#7
> +   signature file that can validate the roothash of the tree.
> +
> +   If unsure, say N.
> +
>  config DM_VERITY_FEC
>   bool "Verity forward error correction support"
>   depends on DM_VERITY
> diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> index be7a6eb92abc..8a8c142bcfe1 100644
> --- a/drivers/md/Makefile
> +++ b/drivers/md/Makefile
> @@ -61,7 +61,7 @@ obj-$(CONFIG_DM_LOG_USERSPACE)  += dm-log-userspace.o
>  obj-$(CONFIG_DM_ZERO)+= dm-zero.o
>  obj-$(CONFIG_DM_RAID)+= dm-raid.o
>  obj-$(CONFIG_DM_THIN_PROVISIONING)   += dm-thin-pool.o
> -obj-$(CONFIG_DM_VERITY)  += dm-verity.o
> +obj-$(CONFIG_DM_VERITY)  += dm-verity.o dm-verity-verify-sig.o
>  obj-$(CONFIG_DM_CACHE)   += dm-cache.o
>  obj-$(CONFIG_DM_CACHE_SMQ)   += dm-cache-smq.o
>  obj-$(CONFIG_DM_ERA) += dm-era.o
> diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
> index f4c31ffaa88e..53aebfa8bc38 100644
> --- a/drivers/md/dm-verity-target.c
> +++ b/drivers/md/dm-verity-target.c
> @@ -16,7 +16,7 @@
>  
>  #include "dm-verity.h"
>  #include "dm-verity-fec.h"
> -
> +#include "dm-verity-verify-sig.h"
>  #include 
>  #include 
>  
> @@ -34,7 +34,11 @@
>  #define DM_VERITY_OPT_IGN_ZEROES "ignore_zero_blocks"
>  #define DM_VERITY_OPT_AT_MOST_ONCE   "check_at_most_once"
>  
> -#define DM_VERITY_OPTS_MAX   (2 + DM_VERITY_OPTS_FEC)
> +#define DM_VERITY_OPTS_MAX   (2 + DM_VERITY_OPTS_FEC + \
> +  DM_VERITY_ROOT_HASH_VERIFICATION_OPTS)
> +
> +#define DM_VERITY_MANDATORY_ARGS10
> +
>  
>  static unsigned dm_verity_prefetch_cluster = DM_VERITY_DEFAULT_PREFETCH_SIZE;
>  
> @@ -855,7 +859,8 @@ static int verity_alloc_zero_digest(struct dm_verity *v)
>   return r;
>  }
>  
> -static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v)
> +static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v,
> +  struct dm_verity_sig_opts *verify_args)
>  {
>   int r;
>   unsigned argc;
> @@ -904,6 +909,15 @@ static int verity_parse_opt_args(struct dm_arg_set *as, 
> struct

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim

On Tue, May 21, 2019 at 08:25:55AM +0530, Anshuman Khandual wrote:
> 
> 
> On 05/20/2019 10:29 PM, Tim Murray wrote:
> > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual
> >  wrote:
> >>
> >> Or Is the objective here is reduce the number of processes which get 
> >> killed by
> >> lmkd by triggering swapping for the unused memory (user hinted) sooner so 
> >> that
> >> they dont get picked by lmkd. Under utilization for zram hardware is a 
> >> concern
> >> here as well ?
> > 
> > The objective is to avoid some instances of memory pressure by
> > proactively swapping pages that userspace knows to be cold before
> > those pages reach the end of the LRUs, which in turn can prevent some
> > apps from being killed by lmk/lmkd. As soon as Android userspace knows
> > that an application is not being used and is only resident to improve
> > performance if the user returns to that app, we can kick off
> > process_madvise on that process's pages (or some portion of those
> > pages) in a power-efficient way to reduce memory pressure long before
> > the system hits the free page watermark. This allows the system more
> > time to put pages into zram versus waiting for the watermark to
> > trigger kswapd, which decreases the likelihood that later memory
> > allocations will cause enough pressure to trigger a kill of one of
> > these apps.
> 
> So this opens up bit of LRU management to user space hints. Also because the 
> app
> in itself wont know about the memory situation of the entire system, new 
> system
> call needs to be called from an external process.

That's why process_madvise is introduced here.

> 
> > 
> >> Swapping out memory into zram wont increase the latency for a hot start ? 
> >> Or
> >> is it because as it will prevent a fresh cold start which anyway will be 
> >> slower
> >> than a slow hot start. Just being curious.
> > 
> > First, not all swapped pages will be reloaded immediately once an app
> > is resumed. We've found that an app's working set post-process_madvise
> > is significantly smaller than what an app allocates when it first
> > launches (see the delta between pswpin and pswpout in Minchan's
> > results). Presumably because of this, faulting to fetch from zram does
> 
> pswpin  4176131392647 975034 233.00
> pswpout127422426617311387507 108.00
> 
> IIUC the swap-in ratio is way higher in comparison to that of swap out. Is 
> that
> always the case ? Or it tend to swap out from an active area of the working 
> set
> which faulted back again.

I think it's because apps are alive longer via reducing being killed
so turn into from pgpgin to swapin.

> 
> > not seem to introduce a noticeable hot start penalty, not does it
> > cause an increase in performance problems later in the app's
> > lifecycle. I've measured with and without process_madvise, and the
> > differences are within our noise bounds. Second, because we're not
> 
> That is assuming that post process_madvise() working set for the application 
> is
> always smaller. There is another challenge. The external process should 
> ideally
> have the knowledge of active areas of the working set for an application in
> question for it to invoke process_madvise() correctly to prevent such 
> scenarios.

There are several ways to detect workingset more accurately at the cost
of runtime. For example, with idle page tracking or clear_refs. Accuracy
is always trade-off of overhead for LRU aging.

> 
> > preemptively evicting file pages and only making them more likely to
> > be evicted when there's already memory pressure, we avoid the case
> > where we process_madvise an app then immediately return to the app and
> > reload all file pages in the working set even though there was no
> > intervening memory pressure. Our initial version of this work evicted
> 
> That would be the worst case scenario which should be avoided. Memory pressure
> must be a parameter before actually doing the swap out. But pages if know to 
> be
> inactive/cold can be marked high priority to be swapped out.
> 
> > file pages preemptively and did cause a noticeable slowdown (~15%) for
> > that case; this patch set avoids that slowdown. Finally, the benefit
> > from avoiding cold starts is huge. The performance improvement from
> > having a hot start instead of a cold start ranges from 3x for very
> > small apps to 50x+ for larger apps like high-fidelity games.
> 
> Is there any other real world scenario apart from this app based ecosystem 
> where
> user hinted LRU management might be helpful ? Just being curious. Thanks for 
> the
> detailed explanation. I will continue looking into this series.

linux-next: Tree for May 21

2019-05-20 Thread Stephen Rothwell

Hi all,

Changes since 20190520:

New trees: soc-fsl, soc-fsl-fixes

Removed trees: (not updated for more than a year)
alpine, samsung, sh, befs, kconfig, dwmw2-iommu, trivial,
target-updates, target-bva, init_task

The imx-mxs tree gained a build failure so I used the version from
next-20190520.

The sunxi tree gained a conflict against the imx-mxs tree.

The drm-misc tree gained conflicts against Linis' and the amdgpu trees.

Non-merge commits (relative to Linus' tree): 991
 998 files changed, 29912 insertions(+), 14691 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 290 trees (counting Linus' and 70 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux)
Merging fixes/master (2bbacd1a9278 Merge tag 'kconfig-v5.2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kspp-gustavo/for-next/kspp (b324f1b28dc0 afs: yfsclient: Mark expected 
switch fall-throughs)
Merging kbuild-current/fixes (a2d635decbfa Merge tag 'drm-next-2019-05-09' of 
git://anongit.freedesktop.org/drm/drm)
Merging arc-current/for-curr (c5a1726d7383 ARC: entry: EV_Trap expects r10 (vs. 
r9) to have exception cause)
Merging arm-current/fixes (e17b1af96b2a ARM: 8857/1: efi: enable CP15 DMB 
instructions before cleaning the cache)
Merging arm64-fixes/for-next/fixes (7a0a93c51799 arm64: vdso: Explicitly add 
build-id option)
Merging m68k-current/for-linus (fdd20ec8786a Documentation/features/time: Mark 
m68k having modern-timekeeping)
Merging powerpc-fixes/fixes (672eaf37db9f powerpc/cacheinfo: Remove double free)
Merging sparc/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (fa2c52be7129 vlan: Mark expected switch fall-through)
Merging bpf/master (6a0a923dfa14 of_net: fix of_get_mac_address retval if 
compiled without CONFIG_OF)
Merging ipsec/master (9b3040a6aafd ipv4: Define __ipv4_neigh_lookup_noref when 
CONFIG_INET is disabled)
Merging netfilter/master (2c82c7e724ff netfilter: nf_tables: fix oops during 
rule dump)
Merging ipvs/master (b2e3d68d1251 netfilter: nft_compat: destroy function must 
not have side effects)
Merging wireless-drivers/master (7a0f8ad5ff63 Merge ath-current from 
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git)
Merging mac80211/master (933b40530b4b mac80211: remove set but not used 
variable 'old')
Merging rdma-fixes/for-rc (2557fabd6e29 RDMA/hns: Bugfix for mapping user db)
Merging sound-current/for-linus (c7b55fabfa44 ALSA: hdac: fix memory release 
for SST and SOF drivers)
Merging sound-asoc-fixes/for-linus (08b9e0213aeb Merge branch 'asoc-5.1' into 
asoc-linus)
Merging regmap-fixes/for-linus (1d6106cafb37 Merge branch 'regmap-5.1' into 
regmap-linus)
Merging regulator-fixes/for-linus (0d183fc1760f Merge branch 'regulator-5.1' 
into regulator-linus)
Merging spi-fixes/for-linus (72e3b3285a43 Merge branch 'spi-5.1' into spi-linus)
Merging pci-current/for-linus (a188339ca5a3 Linux 5.2-rc1)
Merging driver-core.current/driver-core-linus (a188339ca5a3 Linux 5.2-rc1)
Merging tty.current/tty-linus (a18833

Re: Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm

2019-05-20 Thread Vinod Koul

On 21-05-19, 04:58, Robin Gong wrote:
> > -Original Message-
> > From: Vinod Koul 
> > Sent: 2019年5月21日 12:18
> > 
> > On 07-05-19, 09:16, Robin Gong wrote:
> > > Because the number of ecspi1 rx event on i.mx8mm is 0, the condition
> > > check ignore such special case without dma channel enabled, which
> > > caused
> > > ecspi1 rx works failed. Actually, no need to check event_id0, checking
> > > event_id1 is enough for DEV_2_DEV case because it's so lucky that
> > > event_id1 never be 0.
> > 
> > Well is that by chance or design that event_id1 will be never 0?
> > 
> That's by chance. DEV_2_DEV is just for Audio case and non-zero for event_id1 
> on current i.MX family.

Then it wont be fgood to rely on chance :)

-- 
~Vinod

Re: [PATCH] input: keyboard: imx: use devm_platform_ioremap_resource() to simplify code

2019-05-20 Thread dmitry.torok...@gmail.com

On Mon, Apr 01, 2019 at 05:28:12AM +, Anson Huang wrote:
> Use the new helper devm_platform_ioremap_resource() which wraps the
> platform_get_resource() and devm_ioremap_resource() together, to
> simplify the code.
> 
> Signed-off-by: Anson Huang 

Applied, thank you.

> ---
>  drivers/input/keyboard/imx_keypad.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/input/keyboard/imx_keypad.c 
> b/drivers/input/keyboard/imx_keypad.c
> index 539cb67..cf08f4a 100644
> --- a/drivers/input/keyboard/imx_keypad.c
> +++ b/drivers/input/keyboard/imx_keypad.c
> @@ -422,7 +422,6 @@ static int imx_keypad_probe(struct platform_device *pdev)
>   dev_get_platdata(>dev);
>   struct imx_keypad *keypad;
>   struct input_dev *input_dev;
> - struct resource *res;
>   int irq, error, i, row, col;
>  
>   if (!keymap_data && !pdev->dev.of_node) {
> @@ -455,8 +454,7 @@ static int imx_keypad_probe(struct platform_device *pdev)
>   timer_setup(>check_matrix_timer,
>   imx_keypad_check_for_events, 0);
>  
> - res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> - keypad->mmio_base = devm_ioremap_resource(>dev, res);
> + keypad->mmio_base = devm_platform_ioremap_resource(pdev, 0);
>   if (IS_ERR(keypad->mmio_base))
>   return PTR_ERR(keypad->mmio_base);
>  
> -- 
> 2.7.4
> 

-- 
Dmitry

Re: [PATCH 1/2] Input: elantech - enable middle button support on 2 ThinkPads

2019-05-20 Thread Dmitry Torokhov

Hi Aaron,

On Sun, May 19, 2019 at 03:27:10PM +0800, Aaron Ma wrote:
> Adding 2 new touchpad PNPIDs to enable middle button support.

Could you add their names in the comments please?

> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Aaron Ma 
> ---
>  drivers/input/mouse/elantech.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c
> index a7f8b1614559..530142b5a115 100644
> --- a/drivers/input/mouse/elantech.c
> +++ b/drivers/input/mouse/elantech.c
> @@ -1189,6 +1189,8 @@ static const char * const middle_button_pnp_ids[] = {
>   "LEN2132", /* ThinkPad P52 */
>   "LEN2133", /* ThinkPad P72 w/ NFC */
>   "LEN2134", /* ThinkPad P72 */
> + "LEN0407",
> + "LEN0408",

These should come first - I'd like to keep the list sorted
alphabetically.

>   NULL
>  };
>  
> -- 
> 2.17.1
> 

Thanks.

-- 
Dmitry

[PATCH v3] kernel: fix typos and some coding style in comments

2019-05-20 Thread Weitao Hou

fix lenght to length

Signed-off-by: Weitao Hou 
---
Changes in v3:
- fix all other same typos with git grep
---
 .../devicetree/bindings/usb/s3c2410-usb.txt|  2 +-
 .../wireless/mediatek/mt76/mt76x02_usb_core.c  |  2 +-
 kernel/sysctl.c| 18 +-
 sound/soc/qcom/qdsp6/q6asm.c   |  2 +-
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/usb/s3c2410-usb.txt 
b/Documentation/devicetree/bindings/usb/s3c2410-usb.txt
index e45b38ce2986..26c85afd0b53 100644
--- a/Documentation/devicetree/bindings/usb/s3c2410-usb.txt
+++ b/Documentation/devicetree/bindings/usb/s3c2410-usb.txt
@@ -4,7 +4,7 @@ OHCI
 
 Required properties:
  - compatible: should be "samsung,s3c2410-ohci" for USB host controller
- - reg: address and lenght of the controller memory mapped region
+ - reg: address and length of the controller memory mapped region
  - interrupts: interrupt number for the USB OHCI controller
  - clocks: Should reference the bus and host clocks
  - clock-names: Should contain two strings
diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c 
b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c
index 6b89f7eab26c..e0f5e6202a27 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c
@@ -53,7 +53,7 @@ int mt76x02u_skb_dma_info(struct sk_buff *skb, int port, u32 
flags)
pad = round_up(skb->len, 4) + 4 - skb->len;
 
/* First packet of a A-MSDU burst keeps track of the whole burst
-* length, need to update lenght of it and the last packet.
+* length, need to update length of it and the last packet.
 */
skb_walk_frags(skb, iter) {
last = iter;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 943c89178e3d..f78f725f225e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -187,17 +187,17 @@ extern int no_unaligned_warning;
  * enum sysctl_writes_mode - supported sysctl write modes
  *
  * @SYSCTL_WRITES_LEGACY: each write syscall must fully contain the sysctl 
value
- * to be written, and multiple writes on the same sysctl file descriptor
- * will rewrite the sysctl value, regardless of file position. No warning
- * is issued when the initial position is not 0.
+ * to be written, and multiple writes on the same sysctl file descriptor
+ * will rewrite the sysctl value, regardless of file position. No warning
+ * is issued when the initial position is not 0.
  * @SYSCTL_WRITES_WARN: same as above but warn when the initial file position 
is
- * not 0.
+ * not 0.
  * @SYSCTL_WRITES_STRICT: writes to numeric sysctl entries must always be at
- * file position 0 and the value must be fully contained in the buffer
- * sent to the write syscall. If dealing with strings respect the file
- * position, but restrict this to the max length of the buffer, anything
- * passed the max lenght will be ignored. Multiple writes will append
- * to the buffer.
+ * file position 0 and the value must be fully contained in the buffer
+ * sent to the write syscall. If dealing with strings respect the file
+ * position, but restrict this to the max length of the buffer, anything
+ * passed the max length will be ignored. Multiple writes will append
+ * to the buffer.
  *
  * These write modes control how current file position affects the behavior of
  * updating sysctl values through the proc interface on each write.
diff --git a/sound/soc/qcom/qdsp6/q6asm.c b/sound/soc/qcom/qdsp6/q6asm.c
index 4f85cb19a309..e8141a33a55e 100644
--- a/sound/soc/qcom/qdsp6/q6asm.c
+++ b/sound/soc/qcom/qdsp6/q6asm.c
@@ -1194,7 +1194,7 @@ EXPORT_SYMBOL_GPL(q6asm_open_read);
  * q6asm_write_async() - non blocking write
  *
  * @ac: audio client pointer
- * @len: lenght in bytes
+ * @len: length in bytes
  * @msw_ts: timestamp msw
  * @lsw_ts: timestamp lsw
  * @wflags: flags associated with write
-- 
2.18.0

Re: [PATCH 2/2] Input: synaptics - remove X240 from the topbuttonpad list

2019-05-20 Thread Dmitry Torokhov

Hi Aaron,

On Sun, May 19, 2019 at 03:27:11PM +0800, Aaron Ma wrote:
> Lenovo ThinkPad X240 does not have the top software button.
> When this wrong ID in top button list, smbus mode will fail to probe,
> so keep it working at PS2 mode.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Aaron Ma 
> ---
>  drivers/input/mouse/synaptics.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c
> index b6da0c1267e3..6ae7bc92476b 100644
> --- a/drivers/input/mouse/synaptics.c
> +++ b/drivers/input/mouse/synaptics.c
> @@ -140,7 +140,6 @@ static const char * const topbuttonpad_pnp_ids[] = {
>   "LEN002E",
>   "LEN0033", /* Helix */
>   "LEN0034", /* T431s, L440, L540, T540, W540, X1 Carbon 2nd */
> - "LEN0035", /* X240 */

According to the history this came from Synaptics through Hans, so I'd
like to make sure there are no several X240 versions floating around...

>   "LEN0036", /* T440 */
>   "LEN0037", /* X1 Carbon 2nd */
>   "LEN0038",
> -- 
> 2.17.1
> 

Thanks.

-- 
Dmitry

Re: [PATCH V6 02/15] PCI/PME: Export pcie_pme_disable_msi() & pcie_pme_no_msi() APIs

2019-05-20 Thread Vidya Sagar


On 5/20/2019 11:27 PM, Bjorn Helgaas wrote:

On Sat, May 18, 2019 at 07:28:29AM +0530, Vidya Sagar wrote:

On 5/18/2019 12:25 AM, Bjorn Helgaas wrote:

On Fri, May 17, 2019 at 11:23:36PM +0530, Vidya Sagar wrote:

On 5/17/2019 6:54 PM, Bjorn Helgaas wrote:

Do you have "lspci -vvxxx" output for the root ports handy?

If there's some clue in the standard config space that would tell us
that MSI works for some events but not others, we could make the PCI
core pay attention it.  That would be the best solution because it
wouldn't require Tegra-specific code.


Here is the output of 'lspci vvxxx' for one of Tegra194's root ports.


Thanks!

This port advertises both MSI and MSI-X, and neither one is enabled.
This particular port doesn't have a slot, so hotplug isn't applicable
to it.

But if I understand correctly, if MSI or MSI-X were enabled and the
port had a slot, the port would generate MSI/MSI-X hotplug interrupts.
But PME and AER events would still cause INTx interrupts (even with
MSI or MSI-X enabled).

Do I have that right?  I just want to make sure that the reason for
PME being INTx is a permanent hardware choice and that it's not
related to MSI and MSI-X currently being disabled.


Yes. Thats right. Its hardware choice that our hardware engineers made to
use INTx for PME instead of MSI irrespective of MSI/MSI-X enabled/disabled
in the root port.


Here are more spec references that seem applicable:

   - PCIe r4.0, sec 7.7.1.2 (Message Control Register for MSI) says:

   MSI Enable – If Set and the MSI-X Enable bit in the MSI-X
   Message Control register (see Section 7.9.2) is Clear, the
   Function is permitted to use MSI to request service and is
   prohibited from using INTx interrupts.

   - PCIe r4.0, sec 7.7.2.2 (Message Control Register for MSI-X) says:

   MSI-X Enable – If Set and the MSI Enable bit in the MSI Message
   Control register (see Section 6.8.1.3) is Clear, the Function is
   permitted to use MSI-X to request service and is prohibited from
   using INTx interrupts (if implemented).

I read that to mean a device is prohibited from using MSI/MSI-X for
some interrupts and INTx for others.  Since Tegra194 cannot use
MSI/MSI-X for PME, it should use INTx for *all* interrupts.  That
makes the MSI/MSI-X Capabilities superfluous, and they should be
omitted.

If we set pdev->no_msi for Tegra194, we'll avoid MSI/MSI-X completely,
so we'll assume *all* interrupts including hotplug will be INTx.  Will
that work?

Yes. We are fine with having all root port originated interrupts getting 
generated
through INTx instead of MSI/MSI-X.

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim

On Mon, May 20, 2019 at 06:44:52PM -0700, Matthew Wilcox wrote:
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> 
> Do we tear down page tables for these ranges?  That seems like a good

True for MADV_COLD(reclaiming) but false for MADV_COOL(deactivating) at
this implementation.

> way of reclaiming potentially a substantial amount of memory.

Given that consider refauting are spread out over time and reclaim occurs
in burst, that does make sense to speed up the reclaiming. However, a
concern to me is anonymous pages since they need swap cache insertion,
which would be wasteful if they are not reclaimed, finally.

[PATCH 07/12] powerpc/pseries/svm: Use shared memory for Debug Trace Log (DTL)

2019-05-20 Thread Thiago Jung Bauermann

From: Anshuman Khandual 

Secure guests need to share the DTL buffers with the hypervisor. To that
end, use a kmem_cache constructor which converts the underlying buddy
allocated SLUB cache pages into shared memory.

Signed-off-by: Anshuman Khandual 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/svm.h  |  5 
 arch/powerpc/platforms/pseries/Makefile |  1 +
 arch/powerpc/platforms/pseries/setup.c  |  5 +++-
 arch/powerpc/platforms/pseries/svm.c| 40 +
 4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h
index fef3740f46a6..f253116c31fc 100644
--- a/arch/powerpc/include/asm/svm.h
+++ b/arch/powerpc/include/asm/svm.h
@@ -15,6 +15,9 @@ static inline bool is_secure_guest(void)
return mfmsr() & MSR_S;
 }
 
+void dtl_cache_ctor(void *addr);
+#define get_dtl_cache_ctor()   (is_secure_guest() ? dtl_cache_ctor : NULL)
+
 #else /* CONFIG_PPC_SVM */
 
 static inline bool is_secure_guest(void)
@@ -22,5 +25,7 @@ static inline bool is_secure_guest(void)
return false;
 }
 
+#define get_dtl_cache_ctor() NULL
+
 #endif /* CONFIG_PPC_SVM */
 #endif /* _ASM_POWERPC_SVM_H */
diff --git a/arch/powerpc/platforms/pseries/Makefile 
b/arch/powerpc/platforms/pseries/Makefile
index a43ec843c8e2..b7b6e6f52bd0 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_LPARCFG) += lparcfg.o
 obj-$(CONFIG_IBMVIO)   += vio.o
 obj-$(CONFIG_IBMEBUS)  += ibmebus.o
 obj-$(CONFIG_PAPR_SCM) += papr_scm.o
+obj-$(CONFIG_PPC_SVM)  += svm.o
 
 ifdef CONFIG_PPC_PSERIES
 obj-$(CONFIG_SUSPEND)  += suspend.o
diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index e4f0dfd4ae33..c928e6e8a279 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 #include "../../../../drivers/pci/pci.h"
@@ -329,8 +330,10 @@ static inline int alloc_dispatch_logs(void)
 
 static int alloc_dispatch_log_kmem_cache(void)
 {
+   void (*ctor)(void *) = get_dtl_cache_ctor();
+
dtl_cache = kmem_cache_create("dtl", DISPATCH_LOG_BYTES,
-   DISPATCH_LOG_BYTES, 0, NULL);
+   DISPATCH_LOG_BYTES, 0, ctor);
if (!dtl_cache) {
pr_warn("Failed to create dispatch trace log buffer cache\n");
pr_warn("Stolen time statistics will be unreliable\n");
diff --git a/arch/powerpc/platforms/pseries/svm.c 
b/arch/powerpc/platforms/pseries/svm.c
new file mode 100644
index ..c508196f7c83
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Secure VM platform
+ *
+ * Copyright 2019 IBM Corporation
+ * Author: Anshuman Khandual 
+ */
+
+#include 
+#include 
+
+/* There's one dispatch log per CPU. */
+#define NR_DTL_PAGE (DISPATCH_LOG_BYTES * CONFIG_NR_CPUS / PAGE_SIZE)
+
+static struct page *dtl_page_store[NR_DTL_PAGE];
+static long dtl_nr_pages;
+
+static bool is_dtl_page_shared(struct page *page)
+{
+   long i;
+
+   for (i = 0; i < dtl_nr_pages; i++)
+   if (dtl_page_store[i] == page)
+   return true;
+
+   return false;
+}
+
+void dtl_cache_ctor(void *addr)
+{
+   unsigned long pfn = PHYS_PFN(__pa(addr));
+   struct page *page = pfn_to_page(pfn);
+
+   if (!is_dtl_page_shared(page)) {
+   dtl_page_store[dtl_nr_pages] = page;
+   dtl_nr_pages++;
+   WARN_ON(dtl_nr_pages >= NR_DTL_PAGE);
+   uv_share_page(pfn, 1);
+   }
+}

RE: Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm

2019-05-20 Thread Robin Gong

> -Original Message-
> From: Vinod Koul 
> Sent: 2019年5月21日 12:18
> 
> On 07-05-19, 09:16, Robin Gong wrote:
> > Because the number of ecspi1 rx event on i.mx8mm is 0, the condition
> > check ignore such special case without dma channel enabled, which
> > caused
> > ecspi1 rx works failed. Actually, no need to check event_id0, checking
> > event_id1 is enough for DEV_2_DEV case because it's so lucky that
> > event_id1 never be 0.
> 
> Well is that by chance or design that event_id1 will be never 0?
> 
That's by chance. DEV_2_DEV is just for Audio case and non-zero for event_id1 
on current i.MX family.

[PATCH 01/12] powerpc/pseries: Introduce option to build secure virtual machines

2019-05-20 Thread Thiago Jung Bauermann

Introduce CONFIG_PPC_SVM to control support for secure guests and include
Ultravisor-related helpers when it is selected

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/ultravisor.h  |  2 +-
 arch/powerpc/kernel/Makefile   |  4 +++-
 arch/powerpc/platforms/pseries/Kconfig | 12 
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index 4ffec7a36acd..09e0a615d96f 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -28,7 +28,7 @@ extern int early_init_dt_scan_ultravisor(unsigned long node, 
const char *uname,
  * This call supports up to 6 arguments and 4 return arguments. Use
  * UCALL_BUFSIZE to size the return argument buffer.
  */
-#if defined(CONFIG_PPC_UV)
+#if defined(CONFIG_PPC_UV) || defined(CONFIG_PPC_SVM)
 long ucall(unsigned long opcode, unsigned long *retbuf, ...);
 #else
 static long ucall(unsigned long opcode, unsigned long *retbuf, ...)
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 43ff4546e469..1e9b721634c8 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -154,7 +154,9 @@ endif
 
 obj-$(CONFIG_EPAPR_PARAVIRT)   += epapr_paravirt.o epapr_hcalls.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
-obj-$(CONFIG_PPC_UV)   += ultravisor.o ucall.o
+ifneq ($(CONFIG_PPC_UV)$(CONFIG_PPC_SVM),)
+obj-y  += ultravisor.o ucall.o
+endif
 
 # Disable GCOV, KCOV & sanitizers in odd or sensitive code
 GCOV_PROFILE_prom_init.o := n
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 9c6b3d860518..82c16aa4f1ce 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -144,3 +144,15 @@ config PAPR_SCM
tristate "Support for the PAPR Storage Class Memory interface"
help
  Enable access to hypervisor provided storage class memory.
+
+config PPC_SVM
+   bool "Secure virtual machine (SVM) support for POWER"
+   depends on PPC_PSERIES
+   default n
+   help
+Support secure guests on POWER. There are certain POWER platforms which
+support secure guests using the Protected Execution Facility, with the
+help of an Ultravisor executing below the hypervisor layer. This
+enables the support for those guests.
+
+If unsure, say "N".

Re: [PATCH] dma: dw-axi-dmac: fix null dereference when pointer first is null

2019-05-20 Thread Vinod Koul

On 08-05-19, 23:33, Colin King wrote:
> From: Colin Ian King 
> 
> In the unlikely event that axi_desc_get returns a null desc in the
> very first iteration of the while-loop the error exit path ends
> up calling axi_desc_put on a null pointer 'first' and this causes
> a null pointer dereference.  Fix this by adding a null check on
> pointer 'first' before calling axi_desc_put.

Applied, thanks

-- 
~Vinod

[RFC PATCH 02/12] powerpc: Add support for adding an ESM blob to the zImage wrapper

2019-05-20 Thread Thiago Jung Bauermann

From: Benjamin Herrenschmidt 

For secure VMs, the signing tool will create a ticket called the "ESM blob"
for the Enter Secure Mode ultravisor call with the signatures of the kernel
and initrd among other things.

This adds support to the wrapper script for adding that blob via the "-e"
option to the zImage.pseries.

It also adds code to the zImage wrapper itself to retrieve and if necessary
relocate the blob, and pass its address to Linux via the device-tree, to be
later consumed by prom_init.

Signed-off-by: Benjamin Herrenschmidt 
[ Minor adjustments to some comments. ]
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/boot/main.c   | 41 ++
 arch/powerpc/boot/ops.h|  2 ++
 arch/powerpc/boot/wrapper  | 24 +---
 arch/powerpc/boot/zImage.lds.S |  8 +++
 4 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/main.c b/arch/powerpc/boot/main.c
index 78aaf4ffd7ab..ca612efd3e81 100644
--- a/arch/powerpc/boot/main.c
+++ b/arch/powerpc/boot/main.c
@@ -150,6 +150,46 @@ static struct addr_range prep_initrd(struct addr_range 
vmlinux, void *chosen,
return (struct addr_range){(void *)initrd_addr, initrd_size};
 }
 
+#ifdef __powerpc64__
+static void prep_esm_blob(struct addr_range vmlinux, void *chosen)
+{
+   unsigned long esm_blob_addr, esm_blob_size;
+
+   /* Do we have an ESM (Enter Secure Mode) blob? */
+   if (_esm_blob_end <= _esm_blob_start)
+   return;
+
+   printf("Attached ESM blob at 0x%p-0x%p\n\r",
+  _esm_blob_start, _esm_blob_end);
+   esm_blob_addr = (unsigned long)_esm_blob_start;
+   esm_blob_size = _esm_blob_end - _esm_blob_start;
+
+   /*
+* If the ESM blob is too low it will be clobbered when the
+* kernel relocates to its final location.  In this case,
+* allocate a safer place and move it.
+*/
+   if (esm_blob_addr < vmlinux.size) {
+   void *old_addr = (void *)esm_blob_addr;
+
+   printf("Allocating 0x%lx bytes for esm_blob ...\n\r",
+  esm_blob_size);
+   esm_blob_addr = (unsigned long)malloc(esm_blob_size);
+   if (!esm_blob_addr)
+   fatal("Can't allocate memory for ESM blob !\n\r");
+   printf("Relocating ESM blob 0x%lx <- 0x%p (0x%lx bytes)\n\r",
+  esm_blob_addr, old_addr, esm_blob_size);
+   memmove((void *)esm_blob_addr, old_addr, esm_blob_size);
+   }
+
+   /* Tell the kernel ESM blob address via device tree. */
+   setprop_val(chosen, "linux,esm-blob-start", (u32)(esm_blob_addr));
+   setprop_val(chosen, "linux,esm-blob-end", (u32)(esm_blob_addr + 
esm_blob_size));
+}
+#else
+static inline void prep_esm_blob(struct addr_range vmlinux, void *chosen) { }
+#endif
+
 /* A buffer that may be edited by tools operating on a zImage binary so as to
  * edit the command line passed to vmlinux (by setting /chosen/bootargs).
  * The buffer is put in it's own section so that tools may locate it easier.
@@ -218,6 +258,7 @@ void start(void)
vmlinux = prep_kernel();
initrd = prep_initrd(vmlinux, chosen,
 loader_info.initrd_addr, loader_info.initrd_size);
+   prep_esm_blob(vmlinux, chosen);
prep_cmdline(chosen);
 
printf("Finalizing device tree...");
diff --git a/arch/powerpc/boot/ops.h b/arch/powerpc/boot/ops.h
index cd043726ed88..e0606766480f 100644
--- a/arch/powerpc/boot/ops.h
+++ b/arch/powerpc/boot/ops.h
@@ -251,6 +251,8 @@ extern char _initrd_start[];
 extern char _initrd_end[];
 extern char _dtb_start[];
 extern char _dtb_end[];
+extern char _esm_blob_start[];
+extern char _esm_blob_end[];
 
 static inline __attribute__((const))
 int __ilog2_u32(u32 n)
diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index f9141eaec6ff..36b2ad6cd5b7 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -14,6 +14,7 @@
 # -i initrdspecify initrd file
 # -d devtree   specify device-tree blob
 # -s tree.dts  specify device-tree source file (needs dtc installed)
+# -e esm_blob   specify ESM blob for secure images
 # -c   cache $kernel.strip.gz (use if present & newer, else make)
 # -C prefixspecify command prefix for cross-building tools
 #  (strip, objcopy, ld)
@@ -38,6 +39,7 @@ platform=of
 initrd=
 dtb=
 dts=
+esm_blob=
 cacheit=
 binary=
 compression=.gz
@@ -60,9 +62,9 @@ tmpdir=.
 
 usage() {
 echo 'Usage: wrapper [-o output] [-p platform] [-i initrd]' >&2
-echo '   [-d devtree] [-s tree.dts] [-c] [-C cross-prefix]' >&2
-echo '   [-D datadir] [-W workingdir] [-Z (gz|xz|none)]' >&2
-echo '   [--no-compression] [vmlinux]' >&2
+echo '   [-d devtree] [-s tree.dts] [-e esm_blob]' >&2
+echo '   [-c] [-C cross-prefix] [-D datadir] [-W workingdir]' >&2
+echo '   [-Z (gz|xz|none)] [--no-compression]

[PATCH 09/12] powerpc/pseries/svm: Disable doorbells in SVM guests

2019-05-20 Thread Thiago Jung Bauermann

From: Sukadev Bhattiprolu 

Normally, the HV emulates some instructions like MSGSNDP, MSGCLRP
from a KVM guest. To emulate the instructions, it must first read
the instruction from the guest's memory and decode its parameters.

However for a secure guest (aka SVM), the page containing the
instruction is in secure memory and the HV cannot access directly.
It would need the Ultravisor (UV) to facilitate accessing the
instruction and parameters but the UV currently does not have
the support for such accesses.

Until the UV has such support, disable doorbells in SVMs. This might
incur a performance hit but that is yet to be quantified.

With this patch applied (needed only in SVMs not needed for HV) we
are able to launch SVM guests with multi-core support. Eg:

qemu -smp sockets=2,cores=2,threads=2.

Fix suggested by Benjamin Herrenschmidt. Thanks to input from
Paul Mackerras, Ram Pai and Michael Anderson.

Signed-off-by: Sukadev Bhattiprolu 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/platforms/pseries/smp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index 3df46123cce3..95a5c24a1544 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 #include "offline_states.h"
@@ -225,7 +226,7 @@ static __init void pSeries_smp_probe_xics(void)
 {
xics_smp_probe();
 
-   if (cpu_has_feature(CPU_FTR_DBELL))
+   if (cpu_has_feature(CPU_FTR_DBELL) && !is_secure_guest())
smp_ops->cause_ipi = smp_pseries_cause_ipi;
else
smp_ops->cause_ipi = icp_ops->cause_ipi;

Re: [PATCH v1] dmaengine: tegra-apb: Handle DMA_PREP_INTERRUPT flag properly

2019-05-20 Thread Vinod Koul

On 08-05-19, 10:24, Jon Hunter wrote:
> 
> On 05/05/2019 19:12, Dmitry Osipenko wrote:
> > The DMA_PREP_INTERRUPT flag means that descriptor's callback should be
> > invoked upon transfer completion and that's it. For some reason driver
> > completely disables the hardware interrupt handling, leaving channel in
> > unusable state if transfer is issued with the flag being unset. Note
> > that there are no occurrences in the relevant drivers that do not set
> > the flag, hence this patch doesn't fix any actual bug and merely fixes
> > potential problem.
> > 
> > Signed-off-by: Dmitry Osipenko 
> 
> >From having a look at this, I am guessing that we have never really
> tested the case where DMA_PREP_INTERRUPT flag is not set because as you
> mentioned it does not look like this will work at all!

That is a fair argument
> 
> Is there are use-case you are looking at where you don't set the
> DMA_PREP_INTERRUPT flag?
> 
> If not I am wondering if we should even bother supporting this and warn
> if it is not set. AFAICT it does not appear to be mandatory, but maybe
> Vinod can comment more on this.

This is supposed to be used in the cases where you submit a bunch of
descriptors and selectively dont want an interrupt in few cases...

Is this such a case?

Thanks
~Vinod

[PATCH 11/12] powerpc/pseries/svm: Force SWIOTLB for secure guests

2019-05-20 Thread Thiago Jung Bauermann

From: Anshuman Khandual 

SWIOTLB checks range of incoming CPU addresses to be bounced and sees if
the device can access it through its DMA window without requiring bouncing.
In such cases it just chooses to skip bouncing. But for cases like secure
guests on powerpc platform all addresses need to be bounced into the shared
pool of memory because the host cannot access it otherwise. Hence the need
to do the bouncing is not related to device's DMA window and use of bounce
buffers is forced by setting swiotlb_force.

Also, connect the shared memory conversion functions into the
ARCH_HAS_MEM_ENCRYPT hooks and call swiotlb_update_mem_attributes() to
convert SWIOTLB's memory pool to shared memory.

Signed-off-by: Anshuman Khandual 
[ Use ARCH_HAS_MEM_ENCRYPT hooks to share swiotlb memory pool. ]
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/mem_encrypt.h | 19 +++
 arch/powerpc/platforms/pseries/Kconfig |  5 +++
 arch/powerpc/platforms/pseries/svm.c   | 45 ++
 3 files changed, 69 insertions(+)

diff --git a/arch/powerpc/include/asm/mem_encrypt.h 
b/arch/powerpc/include/asm/mem_encrypt.h
new file mode 100644
index ..45d5e4d0e6e0
--- /dev/null
+++ b/arch/powerpc/include/asm/mem_encrypt.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * SVM helper functions
+ *
+ * Copyright 2019 IBM Corporation
+ */
+
+#ifndef _ASM_POWERPC_MEM_ENCRYPT_H
+#define _ASM_POWERPC_MEM_ENCRYPT_H
+
+#define sme_me_mask0ULL
+
+static inline bool sme_active(void) { return false; }
+static inline bool sev_active(void) { return false; }
+
+int set_memory_encrypted(unsigned long addr, int numpages);
+int set_memory_decrypted(unsigned long addr, int numpages);
+
+#endif /* _ASM_POWERPC_MEM_ENCRYPT_H */
diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index 82c16aa4f1ce..41b10f3bc729 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -145,9 +145,14 @@ config PAPR_SCM
help
  Enable access to hypervisor provided storage class memory.
 
+config ARCH_HAS_MEM_ENCRYPT
+   def_bool n
+
 config PPC_SVM
bool "Secure virtual machine (SVM) support for POWER"
depends on PPC_PSERIES
+   select SWIOTLB
+   select ARCH_HAS_MEM_ENCRYPT
default n
help
 Support secure guests on POWER. There are certain POWER platforms which
diff --git a/arch/powerpc/platforms/pseries/svm.c 
b/arch/powerpc/platforms/pseries/svm.c
index c508196f7c83..618622d636d5 100644
--- a/arch/powerpc/platforms/pseries/svm.c
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -7,8 +7,53 @@
  */
 
 #include 
+#include 
+#include 
+#include 
 #include 
 
+static int __init init_svm(void)
+{
+   if (!is_secure_guest())
+   return 0;
+
+   /* Don't release the SWIOTLB buffer. */
+   ppc_swiotlb_enable = 1;
+
+   /*
+* Since the guest memory is inaccessible to the host, devices always
+* need to use the SWIOTLB buffer for DMA even if dma_capable() says
+* otherwise.
+*/
+   swiotlb_force = SWIOTLB_FORCE;
+
+   /* Share the SWIOTLB buffer with the host. */
+   swiotlb_update_mem_attributes();
+
+   return 0;
+}
+machine_early_initcall(pseries, init_svm);
+
+int set_memory_encrypted(unsigned long addr, int numpages)
+{
+   if (!PAGE_ALIGNED(addr))
+   return -EINVAL;
+
+   uv_unshare_page(PHYS_PFN(__pa(addr)), numpages);
+
+   return 0;
+}
+
+int set_memory_decrypted(unsigned long addr, int numpages)
+{
+   if (!PAGE_ALIGNED(addr))
+   return -EINVAL;
+
+   uv_share_page(PHYS_PFN(__pa(addr)), numpages);
+
+   return 0;
+}
+
 /* There's one dispatch log per CPU. */
 #define NR_DTL_PAGE (DISPATCH_LOG_BYTES * CONFIG_NR_CPUS / PAGE_SIZE)

[PATCH 07/14] fs: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time. No change in
semantics.

Signed-off-by: Davidlohr Bueso 
---
 fs/aio.c  |  5 +++--
 fs/coredump.c |  5 +++--
 fs/exec.c | 19 +---
 fs/io_uring.c |  5 +++--
 fs/proc/base.c| 23 
 fs/proc/internal.h|  2 ++
 fs/proc/task_mmu.c| 32 +++
 fs/proc/task_nommu.c  | 22 +++
 fs/userfaultfd.c  | 50 ++-
 include/linux/userfaultfd_k.h |  5 +++--
 10 files changed, 100 insertions(+), 68 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3490d1fa0e16..215d19dbbefa 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -461,6 +461,7 @@ static const struct address_space_operations aio_ctx_aops = 
{
 
 static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
 {
+   DEFINE_RANGE_LOCK_FULL(mmrange);
struct aio_ring *ring;
struct mm_struct *mm = current->mm;
unsigned long size, unused;
@@ -521,7 +522,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int 
nr_events)
ctx->mmap_size = nr_pages * PAGE_SIZE;
pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 
-   if (down_write_killable(>mmap_sem)) {
+   if (mm_write_lock_killable(mm, )) {
ctx->mmap_size = 0;
aio_free_ring(ctx);
return -EINTR;
@@ -530,7 +531,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int 
nr_events)
ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
   PROT_READ | PROT_WRITE,
   MAP_SHARED, 0, , NULL);
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
if (IS_ERR((void *)ctx->mmap_base)) {
ctx->mmap_size = 0;
aio_free_ring(ctx);
diff --git a/fs/coredump.c b/fs/coredump.c
index e42e17e55bfd..433713b63187 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -409,6 +409,7 @@ static int zap_threads(struct task_struct *tsk, struct 
mm_struct *mm,
 
 static int coredump_wait(int exit_code, struct core_state *core_state)
 {
+   DEFINE_RANGE_LOCK_FULL(mmrange);
struct task_struct *tsk = current;
struct mm_struct *mm = tsk->mm;
int core_waiters = -EBUSY;
@@ -417,12 +418,12 @@ static int coredump_wait(int exit_code, struct core_state 
*core_state)
core_state->dumper.task = tsk;
core_state->dumper.next = NULL;
 
-   if (down_write_killable(>mmap_sem))
+   if (mm_write_lock_killable(mm, ))
return -EINTR;
 
if (!mm->core_state)
core_waiters = zap_threads(tsk, mm, core_state, exit_code);
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
 
if (core_waiters > 0) {
struct core_thread *ptr;
diff --git a/fs/exec.c b/fs/exec.c
index e96fd5328739..fbcb36bc4fd1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -241,6 +241,7 @@ static void flush_arg_page(struct linux_binprm *bprm, 
unsigned long pos,
 
 static int __bprm_mm_init(struct linux_binprm *bprm)
 {
+   DEFINE_RANGE_LOCK_FULL(mmrange);
int err;
struct vm_area_struct *vma = NULL;
struct mm_struct *mm = bprm->mm;
@@ -250,7 +251,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
return -ENOMEM;
vma_set_anonymous(vma);
 
-   if (down_write_killable(>mmap_sem)) {
+   if (mm_write_lock_killable(mm, )) {
err = -EINTR;
goto err_free;
}
@@ -273,11 +274,11 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 
mm->stack_vm = mm->total_vm = 1;
arch_bprm_mm_init(mm, vma);
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
bprm->p = vma->vm_end - sizeof(void *);
return 0;
 err:
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
 err_free:
bprm->vma = NULL;
vm_area_free(vma);
@@ -691,6 +692,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
unsigned long stack_top,
int executable_stack)
 {
+   DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long ret;
unsigned long stack_shift;
struct mm_struct *mm = current->mm;
@@ -738,7 +740,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
bprm->loader -= stack_shift;
bprm->exec -= stack_shift;
 
-   if (down_write_killable(>mmap_sem))
+   if (mm_write_lock_killable(mm, ))
return -EINTR;
 
vm_flags = VM_STACK_FLAGS;
@@ -795,7 +797,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
ret = -EFAULT;
 
 out_unlock:
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
return ret;
 }
 EXPORT_SYMBOL(setup_arg_pages);
@@ -1010,6 +1012,7 @@ static int

[PATCH 12/12] powerpc/configs: Enable secure guest support in pseries and ppc64 defconfigs

2019-05-20 Thread Thiago Jung Bauermann

From: Ryan Grimm 

Enables running as a secure guest in platforms with an Ultravisor.

Signed-off-by: Ryan Grimm 
Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/configs/ppc64_defconfig   | 1 +
 arch/powerpc/configs/pseries_defconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index d7c381009636..725297438320 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -31,6 +31,7 @@ CONFIG_DTL=y
 CONFIG_SCANLOG=m
 CONFIG_PPC_SMLPAR=y
 CONFIG_IBMEBUS=y
+CONFIG_PPC_SVM=y
 CONFIG_PPC_MAPLE=y
 CONFIG_PPC_PASEMI=y
 CONFIG_PPC_PASEMI_IOMMU=y
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 62e12f61a3b2..724a574fe4b2 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -42,6 +42,7 @@ CONFIG_DTL=y
 CONFIG_SCANLOG=m
 CONFIG_PPC_SMLPAR=y
 CONFIG_IBMEBUS=y
+CONFIG_PPC_SVM=y
 # CONFIG_PPC_PMAC is not set
 CONFIG_RTAS_FLASH=m
 CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y

[PATCH 13/14] drivers: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time. No change in
semantics.

Signed-off-by: Davidlohr Bueso 
---
 drivers/android/binder_alloc.c   |  7 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c   |  7 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  9 +
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  |  5 +++--
 drivers/gpu/drm/i915/i915_gem.c  |  5 +++--
 drivers/gpu/drm/i915/i915_gem_userptr.c  | 11 +++
 drivers/gpu/drm/nouveau/nouveau_svm.c| 23 ++-
 drivers/gpu/drm/radeon/radeon_cs.c   |  5 +++--
 drivers/gpu/drm/radeon/radeon_gem.c  |  8 +---
 drivers/gpu/drm/radeon/radeon_mn.c   |  7 ---
 drivers/gpu/drm/ttm/ttm_bo_vm.c  |  4 ++--
 drivers/infiniband/core/umem.c   |  7 ---
 drivers/infiniband/core/umem_odp.c   | 12 +++-
 drivers/infiniband/core/uverbs_main.c|  5 +++--
 drivers/infiniband/hw/mlx4/mr.c  |  5 +++--
 drivers/infiniband/hw/qib/qib_user_pages.c   |  7 ---
 drivers/infiniband/hw/usnic/usnic_uiom.c |  5 +++--
 drivers/iommu/amd_iommu_v2.c |  4 ++--
 drivers/iommu/intel-svm.c|  4 ++--
 drivers/media/v4l2-core/videobuf-core.c  |  5 +++--
 drivers/media/v4l2-core/videobuf-dma-contig.c|  5 +++--
 drivers/media/v4l2-core/videobuf-dma-sg.c|  5 +++--
 drivers/misc/cxl/cxllib.c|  5 +++--
 drivers/misc/cxl/fault.c |  5 +++--
 drivers/misc/sgi-gru/grufault.c  | 20 
 drivers/misc/sgi-gru/grufile.c   |  5 +++--
 drivers/misc/sgi-gru/grukservices.c  |  4 +++-
 drivers/misc/sgi-gru/grumain.c   |  6 --
 drivers/misc/sgi-gru/grutables.h |  5 -
 drivers/oprofile/buffer_sync.c   | 12 +++-
 drivers/staging/kpc2000/kpc_dma/fileops.c|  5 +++--
 drivers/tee/optee/call.c |  5 +++--
 drivers/vfio/vfio_iommu_type1.c  |  9 +
 drivers/xen/gntdev.c |  5 +++--
 drivers/xen/privcmd.c| 17 ++---
 include/linux/hmm.h  |  7 ---
 37 files changed, 160 insertions(+), 109 deletions(-)

diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index bb929eb87116..0b9cd9becd76 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -195,6 +195,7 @@ static int binder_update_page_range(struct binder_alloc 
*alloc, int allocate,
struct vm_area_struct *vma = NULL;
struct mm_struct *mm = NULL;
bool need_mm = false;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC,
 "%d: %s pages %pK-%pK\n", alloc->pid,
@@ -220,7 +221,7 @@ static int binder_update_page_range(struct binder_alloc 
*alloc, int allocate,
mm = alloc->vma_vm_mm;
 
if (mm) {
-   down_read(>mmap_sem);
+   mm_read_lock(mm, );
vma = alloc->vma;
}
 
@@ -279,7 +280,7 @@ static int binder_update_page_range(struct binder_alloc 
*alloc, int allocate,
/* vm_insert_page does not seem to increment the refcount */
}
if (mm) {
-   up_read(>mmap_sem);
+   mm_read_unlock(mm, );
mmput(mm);
}
return 0;
@@ -310,7 +311,7 @@ static int binder_update_page_range(struct binder_alloc 
*alloc, int allocate,
}
 err_no_vma:
if (mm) {
-   up_read(>mmap_sem);
+   mm_read_unlock(mm, );
mmput(mm);
}
return vma ? -ENOMEM : -ESRCH;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 123eb0d7e2e9..28ddd42b27be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1348,9 +1348,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 * concurrently and the queues are actually stopped
 */
if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
-   down_write(>mm->mmap_sem);
+   mm_write_lock(current->mm, );
is_invalid_userptr = atomic_read(>invalid);
-   up_write(>mm->mmap_sem);
+   mm_write_unlock(current->mm, );
}
 
mutex_lock(>lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 58ed401c5996..d002df91c7b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -376,13 +376,14 @@

[PATCH 14/14] mm: convert mmap_sem to range mmap_lock

2019-05-20 Thread Davidlohr Bueso

With mmrange now in place and everyone using the mm
locking wrappers, we can convert the rwsem to a the
range locking scheme. Every single user of mmap_sem
will use a full range, which means that there is no
more parallelism than what we already had. This is
the worst case scenario.

Prefetching and some lockdep stuff have been blindly
converted (for now).

This lays out the foundations for later mm address
space locking scalability.

Signed-off-by: Davidlohr Bueso 
---
 arch/x86/events/core.c |  2 +-
 arch/x86/kernel/tboot.c|  2 +-
 arch/x86/mm/fault.c|  2 +-
 drivers/firmware/efi/efi.c |  2 +-
 include/linux/mm.h | 26 +-
 include/linux/mm_types.h   |  4 ++--
 kernel/bpf/stackmap.c  |  9 +
 kernel/fork.c  |  2 +-
 mm/init-mm.c   |  2 +-
 mm/memory.c|  2 +-
 10 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f315425d8468..45ecca077255 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2179,7 +2179,7 @@ static void x86_pmu_event_mapped(struct perf_event 
*event, struct mm_struct *mm)
 * For now, this can't happen because all callers hold mmap_sem
 * for write.  If this changes, we'll need a different solution.
 */
-   lockdep_assert_held_exclusive(>mmap_sem);
+   lockdep_assert_held_exclusive(>mmap_lock);
 
if (atomic_inc_return(>context.perf_rdpmc_allowed) == 1)
on_each_cpu_mask(mm_cpumask(mm), refresh_pce, NULL, 1);
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 6e5ef8fb8a02..e5423e2451d3 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -104,7 +104,7 @@ static struct mm_struct tboot_mm = {
.pgd= swapper_pg_dir,
.mm_users   = ATOMIC_INIT(2),
.mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(init_mm.mmap_sem),
+   .mmap_lock   = __RANGE_LOCK_TREE_INITIALIZER(init_mm.mmap_lock),
.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
 };
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fbb060c89e7d..9f285ba76f1e 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1516,7 +1516,7 @@ static noinline void
 __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code,
unsigned long address)
 {
-   prefetchw(>mm->mmap_sem);
+   prefetchw(>mm->mmap_lock);
 
if (unlikely(kmmio_fault(regs, address)))
return;
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 55b77c576c42..01e4937f3cea 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -80,7 +80,7 @@ struct mm_struct efi_mm = {
.mm_rb  = RB_ROOT,
.mm_users   = ATOMIC_INIT(2),
.mm_count   = ATOMIC_INIT(1),
-   .mmap_sem   = __RWSEM_INITIALIZER(efi_mm.mmap_sem),
+   .mmap_lock  = 
__RANGE_LOCK_TREE_INITIALIZER(efi_mm.mmap_lock),
.page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
.mmlist = LIST_HEAD_INIT(efi_mm.mmlist),
.cpu_bitmap = { [BITS_TO_LONGS(NR_CPUS)] = 0},
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8bf3e2542047..5ac33c46679f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2899,74 +2899,74 @@ static inline void setup_nr_node_ids(void) {}
 static inline bool mm_is_locked(struct mm_struct *mm,
struct range_lock *mmrange)
 {
-   return rwsem_is_locked(>mmap_sem);
+   return range_is_locked(>mmap_lock, mmrange);
 }
 
 /* Reader wrappers */
 static inline int mm_read_trylock(struct mm_struct *mm,
  struct range_lock *mmrange)
 {
-   return down_read_trylock(>mmap_sem);
+   return range_read_trylock(>mmap_lock, mmrange);
 }
 
 static inline void mm_read_lock(struct mm_struct *mm,
struct range_lock *mmrange)
 {
-   down_read(>mmap_sem);
+   range_read_lock(>mmap_lock, mmrange);
 }
 
 static inline void mm_read_lock_nested(struct mm_struct *mm,
   struct range_lock *mmrange, int subclass)
 {
-   down_read_nested(>mmap_sem, subclass);
+   range_read_lock_nested(>mmap_lock, mmrange, subclass);
 }
 
 static inline void mm_read_unlock(struct mm_struct *mm,
  struct range_lock *mmrange)
 {
-   up_read(>mmap_sem);
+   range_read_unlock(>mmap_lock, mmrange);
 }
 
 /* Writer wrappers */
 static inline int mm_write_trylock(struct mm_struct *mm,
   struct range_lock *mmrange)
 {
-   return down_write_trylock(>mmap_sem);
+   return range_write_trylock(>mmap_lock, mmrange);
 }
 
 static

[PATCH 06/14] mm: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time, and we already
have vmf updated. No changes in semantics.

Signed-off-by: Davidlohr Bueso 
---
 include/linux/mm.h |  8 +++---
 mm/filemap.c   |  8 +++---
 mm/frame_vector.c  |  4 +--
 mm/gup.c   | 21 +++
 mm/hmm.c   |  3 ++-
 mm/khugepaged.c| 54 +--
 mm/ksm.c   | 42 +-
 mm/madvise.c   | 36 ++
 mm/memcontrol.c| 10 +---
 mm/memory.c| 10 +---
 mm/mempolicy.c | 25 ++
 mm/migrate.c   | 10 +---
 mm/mincore.c   |  6 +++--
 mm/mlock.c | 20 +--
 mm/mmap.c  | 69 --
 mm/mmu_notifier.c  |  9 ---
 mm/mprotect.c  | 15 ++-
 mm/mremap.c|  9 ---
 mm/msync.c |  9 ---
 mm/nommu.c | 25 ++
 mm/oom_kill.c  |  5 ++--
 mm/process_vm_access.c |  4 +--
 mm/shmem.c |  2 +-
 mm/swapfile.c  |  5 ++--
 mm/userfaultfd.c   | 21 ---
 mm/util.c  | 10 +---
 26 files changed, 252 insertions(+), 188 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 044e428b1905..8bf3e2542047 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1459,6 +1459,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct 
vm_area_struct *start_vma,
  * right now." 1 means "skip the current vma."
  * @mm:mm_struct representing the target process of page table walk
  * @vma:   vma currently walked (NULL if walking outside vmas)
+ * @mmrange:   mm address space range locking
  * @private:   private data for callbacks' usage
  *
  * (see the comment on walk_page_range() for more details)
@@ -2358,8 +2359,8 @@ static inline int check_data_rlimit(unsigned long rlim,
return 0;
 }
 
-extern int mm_take_all_locks(struct mm_struct *mm);
-extern void mm_drop_all_locks(struct mm_struct *mm);
+extern int mm_take_all_locks(struct mm_struct *mm, struct range_lock *mmrange);
+extern void mm_drop_all_locks(struct mm_struct *mm, struct range_lock 
*mmrange);
 
 extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file);
 extern struct file *get_mm_exe_file(struct mm_struct *mm);
@@ -2389,7 +2390,8 @@ extern unsigned long do_mmap(struct file *file, unsigned 
long addr,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
struct list_head *uf);
 extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
-  struct list_head *uf, bool downgrade);
+  struct list_head *uf, bool downgrade,
+  struct range_lock *);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t,
 struct list_head *uf);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 959022841bab..71f0d8a18f40 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1388,7 +1388,7 @@ int __lock_page_or_retry(struct page *page, struct 
mm_struct *mm,
if (flags & FAULT_FLAG_RETRY_NOWAIT)
return 0;
 
-   up_read(>mmap_sem);
+   mm_read_unlock(mm, mmrange);
if (flags & FAULT_FLAG_KILLABLE)
wait_on_page_locked_killable(page);
else
@@ -1400,7 +1400,7 @@ int __lock_page_or_retry(struct page *page, struct 
mm_struct *mm,
 
ret = __lock_page_killable(page);
if (ret) {
-   up_read(>mmap_sem);
+   mm_read_unlock(mm, mmrange);
return 0;
}
} else
@@ -2317,7 +2317,7 @@ static struct file *maybe_unlock_mmap_for_io(struct 
vm_fault *vmf,
if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) ==
FAULT_FLAG_ALLOW_RETRY) {
fpin = get_file(vmf->vma->vm_file);
-   up_read(>vma->vm_mm->mmap_sem);
+   mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange);
}
return fpin;
 }
@@ -2357,7 +2357,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault 
*vmf, struct page *page,
 * mmap_sem here and return 0 if we don't have a fpin.
 */
if (*fpin == NULL)
-   up_read(>vma->vm_mm->mmap_sem);
+   mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange);
return 0;
}
} else
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 4e1a577cbb79..ef33d21b3f39 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -47,7 +47,7 @@ int get_vaddr_frames(unsigned long start, unsigned int

[PATCH 02/14] Introduce range reader/writer lock

2019-05-20 Thread Davidlohr Bueso

This implements a sleepable range rwlock, based on interval tree, serializing
conflicting/intersecting/overlapping ranges within the tree. The largest range
is given by [0, ~0] (inclusive). Unlike traditional locks, range locking
involves dealing with the tree itself and the range to be locked, normally
stack allocated and always explicitly prepared/initialized by the user in a
[a0, a1] a0 <= a1 sorted manner, before actually taking the lock.

Interval-tree based range locking is about controlling tasks' forward
progress when adding an arbitrary interval (node) to the tree, depending
on any overlapping ranges. A task can only continue (wakeup) if there are
no intersecting ranges, thus achieving mutual exclusion. To this end, a
reference counter is kept for each intersecting range in the tree
(_before_ adding itself to it). To enable shared locking semantics,
the reader to-be-locked will not take reference if an intersecting node
is also a reader, therefore ignoring the node altogether.

Fairness and freedom of starvation are guaranteed by the lack of lock
stealing, thus range locks depend directly on interval tree semantics.
This is particularly for iterations, where the key for the rbtree is
given by the interval's low endpoint, and duplicates are walked as it
would an inorder traversal of the tree.

The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where
R_all is total number of ranges and R_int is the number of ranges
intersecting the operated range.

How much does it cost:
--

The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all
is total number of ranges and R_int is the number of ranges intersecting the
new range range to be added.

Due to its sharable nature, full range locks can be compared with rw-sempahores,
which also serves from a mutex standpoint as writer-only situations are
pretty similar nowadays.

The first is the memory footprint, tree locks are smaller than rwsems: 32 vs
40 bytes, but require an additional 72 bytes of stack for the range structure.

Secondly, because every range call is serialized by the tree->lock, any lock()
fastpath will at least have an interval_tree_insert() and spinlock lock+unlock
overhead compared to a single atomic insn in the case of rwsems. Similar 
scenario
obviously for the unlock() case.

The torture module was used to measure 1-1 differences in lock acquisition with
increasing core counts over a period of 10 minutes. Readers and writers are
interleaved, with a slight advantage to writers as its the first kthread that is
created. The following shows the avg ops/minute with various thread-setups on
boxes with small and large core-counts.

** 4-core AMD Opteron **
(write-only)
rwsem-2thr: 4198.5, stddev: 7.77
range-2thr: 4199.1, stddev: 0.73

rwsem-4thr: 6036.8, stddev: 50.91
range-4thr: 6004.9, stddev: 126.57

rwsem-8thr: 6245.6, stddev: 59.39
range-8thr: 6229.3, stddev: 10.60

(read-only)
rwsem-2thr: 5930.7, stddev: 21.92
range-2thr: 5917.3, stddev: 25.45

rwsem-4thr: 9881.6, stddev: 0.70
range-4thr: 9540.2, stddev: 98.28

rwsem-8thr: 11633.2, stddev: 7.72
range-8thr: 11314.7, stddev: 62.22

For the read/write-only cases, there is very little difference between the 
range lock
and rwsems, with up to a 3% hit, which could very well be considered in the 
noise range.

(read-write)
rwsem-write-1thr: 1744.8, stddev: 11.59
rwsem-read-1thr:  1043.1, stddev: 3.97
range-write-1thr: 1740.2, stddev: 5.99
range-read-1thr:  1022.5, stddev: 6.41

rwsem-write-2thr: 1662.5, stddev: 0.70
rwsem-read-2thr:  1278.0, stddev: 25.45
range-write-2thr: 1321.5, stddev: 51.61
range-read-2thr:  1243.5, stddev: 30.40

rwsem-write-4thr: 1761.0, stddev: 11.31
rwsem-read-4thr:  1426.0, stddev: 7.07
range-write-4thr: 1417.0, stddev: 29.69
range-read-4thr:  1398.0, stddev: 56.56

While a single reader and writer threads does not show must difference, 
increasing
core counts shows that in reader/writer workloads, writer threads can take a 
hit in
raw performance of up to ~20%, while the number of reader throughput is quite 
similar
among both locks.

** 240-core (ht) IvyBridge **
(write-only)
rwsem-120thr: 6844.5, stddev: 82.73
range-120thr: 6070.5, stddev: 85.55

rwsem-240thr: 6292.5, stddev: 146.3
range-240thr: 6099.0, stddev: 15.55

rwsem-480thr: 6164.8, stddev: 33.94
range-480thr: 6062.3, stddev: 19.79

(read-only)
rwsem-120thr: 136860.4, stddev: 2539.92
range-120thr: 138052.2, stddev: 327.39

rwsem-240thr: 235297.5, stddev: 2220.50
range-240thr: 232099.1, stddev: 3614.72

rwsem-480thr: 272683.0, stddev: 3924.32
range-480thr: 256539.2, stddev: 9541.69

Similar to the small box, larger machines show that range locks take only a 
minor
(up to ~6% for 480 threads) hit even in completely exclusive or shared 
scenarios.

(read-write)
rwsem-write-60thr: 4658.1, stddev: 1303.19
rwsem-read-60thr:  1108.7, stddev: 718.42
range-write-60thr: 3203.6, stddev: 139.30
range-read-60thr:  1852.8, stddev: 147.5

rwsem-write-120thr: 3971.3,

[PATCH 12/14] kernel: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time. No change in
semantics.

Signed-off-by: Davidlohr Bueso 
---
 kernel/acct.c   |  5 +++--
 kernel/bpf/stackmap.c   |  7 +--
 kernel/events/core.c|  5 +++--
 kernel/events/uprobes.c | 20 
 kernel/exit.c   |  9 +
 kernel/fork.c   | 16 ++--
 kernel/futex.c  |  5 +++--
 kernel/sched/fair.c |  5 +++--
 kernel/sys.c| 22 +-
 kernel/trace/trace_output.c |  5 +++--
 10 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/kernel/acct.c b/kernel/acct.c
index 81f9831a7859..2bbcecbd78ef 100644
--- a/kernel/acct.c
+++ b/kernel/acct.c
@@ -538,14 +538,15 @@ void acct_collect(long exitcode, int group_dead)
 
if (group_dead && current->mm) {
struct vm_area_struct *vma;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
vma = current->mm->mmap;
while (vma) {
vsize += vma->vm_end - vma->vm_start;
vma = vma->vm_next;
}
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
}
 
spin_lock_irq(>sighand->siglock);
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 950ab2f28922..fdb352bea7e8 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -37,6 +37,7 @@ struct bpf_stack_map {
 struct stack_map_irq_work {
struct irq_work irq_work;
struct rw_semaphore *sem;
+   struct range_lock *mmrange;
 };
 
 static void do_up_read(struct irq_work *entry)
@@ -291,6 +292,7 @@ static void stack_map_get_build_id_offset(struct 
bpf_stack_build_id *id_offs,
struct vm_area_struct *vma;
bool irq_work_busy = false;
struct stack_map_irq_work *work = NULL;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
if (in_nmi()) {
work = this_cpu_ptr(_read_work);
@@ -309,7 +311,7 @@ static void stack_map_get_build_id_offset(struct 
bpf_stack_build_id *id_offs,
 * with build_id.
 */
if (!user || !current || !current->mm || irq_work_busy ||
-   down_read_trylock(>mm->mmap_sem) == 0) {
+   mm_read_trylock(current->mm, ) == 0) {
/* cannot access current->mm, fall back to ips */
for (i = 0; i < trace_nr; i++) {
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
@@ -334,9 +336,10 @@ static void stack_map_get_build_id_offset(struct 
bpf_stack_build_id *id_offs,
}
 
if (!work) {
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
} else {
work->sem = >mm->mmap_sem;
+   work->mmrange = 
irq_work_queue(>irq_work);
/*
 * The irq_work will release the mmap_sem with
diff --git a/kernel/events/core.c b/kernel/events/core.c
index abbd4b3b96c2..3b43cfe63b54 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9079,6 +9079,7 @@ static void perf_event_addr_filters_apply(struct 
perf_event *event)
struct mm_struct *mm = NULL;
unsigned int count = 0;
unsigned long flags;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
/*
 * We may observe TASK_TOMBSTONE, which means that the event tear-down
@@ -9092,7 +9093,7 @@ static void perf_event_addr_filters_apply(struct 
perf_event *event)
if (!mm)
goto restart;
 
-   down_read(>mmap_sem);
+   mm_read_lock(mm, );
}
 
raw_spin_lock_irqsave(>lock, flags);
@@ -9118,7 +9119,7 @@ static void perf_event_addr_filters_apply(struct 
perf_event *event)
raw_spin_unlock_irqrestore(>lock, flags);
 
if (ifh->nr_file_filters) {
-   up_read(>mmap_sem);
+   mm_read_unlock(mm, );
 
mmput(mm);
}
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3689eceb8d0c..6779c237799a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -997,6 +997,7 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
bool is_register = !!new;
struct map_info *info;
int err = 0;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
percpu_down_write(_mmap_sem);
info = build_map_info(uprobe->inode->i_mapping,
@@ -1013,7 +1014,7 @@ register_for_each_vma(struct uprobe *uprobe, struct 
uprobe_consumer *new)
if (err && is_register)
goto free;
 
-   down_write(>mmap_sem);
+   mm_write_lock(mm, );
vma = find_vma(mm, info->vaddr);
if (!vma || !valid_vma(vma, is_register) ||

[PATCH 11/14] ipc: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time. No change in
semantics.

Signed-off-by: Davidlohr Bueso 
---
 ipc/shm.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index ce1ca9f7c6e9..3666fa71bfc2 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1418,6 +1418,7 @@ COMPAT_SYSCALL_DEFINE3(old_shmctl, int, shmid, int, cmd, 
void __user *, uptr)
 long do_shmat(int shmid, char __user *shmaddr, int shmflg,
  ulong *raddr, unsigned long shmlba)
 {
+   DEFINE_RANGE_LOCK_FULL(mmrange);
struct shmid_kernel *shp;
unsigned long addr = (unsigned long)shmaddr;
unsigned long size;
@@ -1544,7 +1545,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
if (err)
goto out_fput;
 
-   if (down_write_killable(>mm->mmap_sem)) {
+   if (mm_write_lock_killable(current->mm, )) {
err = -EINTR;
goto out_fput;
}
@@ -1564,7 +1565,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg,
if (IS_ERR_VALUE(addr))
err = (long)addr;
 invalid:
-   up_write(>mm->mmap_sem);
+   mm_write_unlock(current->mm, );
if (populate)
mm_populate(addr, populate);
 
@@ -1625,6 +1626,7 @@ COMPAT_SYSCALL_DEFINE3(shmat, int, shmid, compat_uptr_t, 
shmaddr, int, shmflg)
  */
 long ksys_shmdt(char __user *shmaddr)
 {
+   DEFINE_RANGE_LOCK_FULL(mmrange);
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
unsigned long addr = (unsigned long)shmaddr;
@@ -1638,7 +1640,7 @@ long ksys_shmdt(char __user *shmaddr)
if (addr & ~PAGE_MASK)
return retval;
 
-   if (down_write_killable(>mmap_sem))
+   if (mm_write_lock_killable(mm, ))
return -EINTR;
 
/*
@@ -1726,7 +1728,7 @@ long ksys_shmdt(char __user *shmaddr)
 
 #endif
 
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
return retval;
 }
 
-- 
2.16.4

[PATCH 08/14] arch/x86: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time. No change in
semantics.

Signed-off-by: Davidlohr Bueso 
---
 arch/x86/entry/vdso/vma.c  | 12 +++-
 arch/x86/kernel/vm86_32.c  |  5 +++--
 arch/x86/kvm/paging_tmpl.h |  9 +
 arch/x86/mm/debug_pagetables.c |  8 
 arch/x86/mm/fault.c|  8 
 arch/x86/mm/mpx.c  | 15 +--
 arch/x86/um/vdso/vma.c |  5 +++--
 7 files changed, 35 insertions(+), 27 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index babc4e7a519c..f6d8950f37b8 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -145,12 +145,13 @@ static const struct vm_special_mapping vvar_mapping = {
  */
 static int map_vdso(const struct vdso_image *image, unsigned long addr)
 {
+   DEFINE_RANGE_LOCK_FULL(mmrange);
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
unsigned long text_start;
int ret = 0;
 
-   if (down_write_killable(>mmap_sem))
+   if (mm_write_lock_killable(mm, ))
return -EINTR;
 
addr = get_unmapped_area(NULL, addr,
@@ -193,7 +194,7 @@ static int map_vdso(const struct vdso_image *image, 
unsigned long addr)
}
 
 up_fail:
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
return ret;
 }
 
@@ -254,8 +255,9 @@ int map_vdso_once(const struct vdso_image *image, unsigned 
long addr)
 {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
-   down_write(>mmap_sem);
+   mm_write_lock(mm, );
/*
 * Check if we have already mapped vdso blob - fail to prevent
 * abusing from userspace install_speciall_mapping, which may
@@ -266,11 +268,11 @@ int map_vdso_once(const struct vdso_image *image, 
unsigned long addr)
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (vma_is_special_mapping(vma, _mapping) ||
vma_is_special_mapping(vma, _mapping)) {
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
return -EEXIST;
}
}
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
 
return map_vdso(image, addr);
 }
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 6a38717d179c..39eecee07dcd 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -171,8 +171,9 @@ static void mark_screen_rdonly(struct mm_struct *mm)
pmd_t *pmd;
pte_t *pte;
int i;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
-   down_write(>mmap_sem);
+   mm_write_lock(mm, );
pgd = pgd_offset(mm, 0xA);
if (pgd_none_or_clear_bad(pgd))
goto out;
@@ -198,7 +199,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
}
pte_unmap_unlock(pte, ptl);
 out:
-   up_write(>mmap_sem);
+   mm_write_unlock(mm, );
flush_tlb_mm_range(mm, 0xA, 0xA + 32*PAGE_SIZE, PAGE_SHIFT, 
false);
 }
 
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 367a47df4ba0..347d3ba41974 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -152,23 +152,24 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, 
struct kvm_mmu *mmu,
unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK;
unsigned long pfn;
unsigned long paddr;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
vma = find_vma_intersection(current->mm, vaddr, vaddr + 
PAGE_SIZE);
if (!vma || !(vma->vm_flags & VM_PFNMAP)) {
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
return -EFAULT;
}
pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
paddr = pfn << PAGE_SHIFT;
table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB);
if (!table) {
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
return -EFAULT;
}
ret = CMPXCHG([index], orig_pte, new_pte);
memunmap(table);
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
}
 
return (ret != orig_pte);
diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c
index cd84f067e41d..0d131edc6a75 100644
--- a/arch/x86/mm/debug_pagetables.c
+++ b/arch/x86/mm/debug_pagetables.c
@@ -15,9 +15,9 @@ DEFINE_SHOW_ATTRIBUTE(ptdump);
 static int ptdump_curknl_show(struct seq_file *m, void *v)
 {
if (current->mm->pgd) {
-

[PATCH 09/14] virt: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time. No change in
semantics.

Signed-off-by: Davidlohr Bueso 
---
 virt/kvm/arm/mmu.c  | 17 ++---
 virt/kvm/async_pf.c |  4 ++--
 virt/kvm/kvm_main.c | 11 ++-
 3 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 74b6582eaa3c..85f8b9ccfabe 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -980,9 +980,10 @@ void stage2_unmap_vm(struct kvm *kvm)
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
int idx;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
idx = srcu_read_lock(>srcu);
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
spin_lock(>mmu_lock);
 
slots = kvm_memslots(kvm);
@@ -990,7 +991,7 @@ void stage2_unmap_vm(struct kvm *kvm)
stage2_unmap_memslot(kvm, memslot);
 
spin_unlock(>mmu_lock);
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
srcu_read_unlock(>srcu, idx);
 }
 
@@ -1688,6 +1689,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
kvm_pfn_t pfn;
pgprot_t mem_type = PAGE_S2;
bool logging_active = memslot_is_logging(memslot);
+   DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long vma_pagesize, flags = 0;
 
write_fault = kvm_is_write_fault(vcpu);
@@ -1700,11 +1702,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
}
 
/* Let's check if we will get back a huge page backed by hugetlbfs */
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
vma = find_vma_intersection(current->mm, hva, hva + 1);
if (unlikely(!vma)) {
kvm_err("Failed to find VMA for hva 0x%lx\n", hva);
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
return -EFAULT;
}
 
@@ -1725,7 +1727,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
if (vma_pagesize == PMD_SIZE ||
(vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> 
PAGE_SHIFT;
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
 
/* We need minimum second+third level pages */
ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
@@ -2280,6 +2282,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva_t reg_end = hva + mem->memory_size;
bool writable = !(mem->flags & KVM_MEM_READONLY);
int ret = 0;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
change != KVM_MR_FLAGS_ONLY)
@@ -2293,7 +2296,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
(kvm_phys_size(kvm) >> PAGE_SHIFT))
return -EFAULT;
 
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
/*
 * A memory region could potentially cover multiple VMAs, and any holes
 * between them, so iterate over all of them to find out if we can map
@@ -2361,7 +2364,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
stage2_flush_memslot(kvm, memslot);
spin_unlock(>mmu_lock);
 out:
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
return ret;
 }
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index e93cd8515134..03d9f9bc5270 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -87,11 +87,11 @@ static void async_pf_execute(struct work_struct *work)
 * mm and might be done in another context, so we must
 * access remotely.
 */
-   down_read(>mmap_sem);
+   mm_read_lock(mm, );
get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
  , );
if (locked)
-   up_read(>mmap_sem);
+   mm_read_unlock(mm, );
 
kvm_async_page_present_sync(vcpu, apf);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e1484150a3dd..421652e66a03 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1331,6 +1331,7 @@ EXPORT_SYMBOL_GPL(kvm_is_visible_gfn);
 unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
 {
struct vm_area_struct *vma;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
unsigned long addr, size;
 
size = PAGE_SIZE;
@@ -1339,7 +1340,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t 
gfn)
if (kvm_is_error_hva(addr))
return PAGE_SIZE;
 
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
vma = find_vma(current->mm, addr);
if (!vma)
goto out;
@@ -1347,7 +1348,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t 
gfn)
size =

[PATCH 10/14] net: teach the mm about range locking

2019-05-20 Thread Davidlohr Bueso

Conversion is straightforward, mmap_sem is used within the
the same function context most of the time. No change in
semantics.

Signed-off-by: Davidlohr Bueso 
---
 net/ipv4/tcp.c | 5 +++--
 net/xdp/xdp_umem.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 53d61ca3ac4b..2be929dcafa8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1731,6 +1731,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
struct tcp_sock *tp;
int inq;
int ret;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
if (address & (PAGE_SIZE - 1) || address != zc->address)
return -EINVAL;
@@ -1740,7 +1741,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
 
sock_rps_record_flow(sk);
 
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
 
ret = -EINVAL;
vma = find_vma(current->mm, address);
@@ -1802,7 +1803,7 @@ static int tcp_zerocopy_receive(struct sock *sk,
frags++;
}
 out:
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
if (length) {
tp->copied_seq = seq;
tcp_rcv_space_adjust(sk);
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 2b18223e7eb8..2bf444fb998d 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -246,16 +246,17 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
unsigned int gup_flags = FOLL_WRITE;
long npgs;
int err;
+   DEFINE_RANGE_LOCK_FULL(mmrange);
 
umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs),
GFP_KERNEL | __GFP_NOWARN);
if (!umem->pgs)
return -ENOMEM;
 
-   down_read(>mm->mmap_sem);
+   mm_read_lock(current->mm, );
npgs = get_user_pages(umem->address, umem->npgs,
  gup_flags | FOLL_LONGTERM, >pgs[0], NULL);
-   up_read(>mm->mmap_sem);
+   mm_read_unlock(current->mm, );
 
if (npgs != umem->npgs) {
if (npgs >= 0) {
-- 
2.16.4

[PATCH 10/12] powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests

2019-05-20 Thread Thiago Jung Bauermann

Secure guest memory is inacessible to devices so regular DMA isn't
possible.

In that case set devices' dma_map_ops to NULL so that the generic
DMA code path will use SWIOTLB and DMA to bounce buffers.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/platforms/pseries/iommu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 03bbb299320e..7d9550edb700 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
@@ -1332,7 +1333,10 @@ void iommu_init_early_pSeries(void)
of_reconfig_notifier_register(_reconfig_nb);
register_memory_notifier(_mem_nb);
 
-   set_pci_dma_ops(_iommu_ops);
+   if (is_secure_guest())
+   set_pci_dma_ops(NULL);
+   else
+   set_pci_dma_ops(_iommu_ops);
 }
 
 static int __init disable_multitce(char *str)

[PATCH 05/14] mm: remove some BUG checks wrt mmap_sem

2019-05-20 Thread Davidlohr Bueso

This patch is a collection of hacks that shamelessly remove
mmap_sem state checks in order to not have to teach file_operations
about range locking; for thp and huge pagecache: By dropping the
rwsem_is_locked checks in zap_pmd_range() and zap_pud_range() we can
avoid having to teach file_operations about mmrange. For example in
xfs: iomap_dio_rw() is called by .read_iter file callbacks.

We also avoid mmap_sem trylock in vm_insert_page(): The rules to
this function state that mmap_sem must be acquired by the caller:

- for write if used in f_op->mmap() (by far the most common case)
- for read if used from vma_op->fault()(with VM_MIXEDMAP)

The only exception is:
  mmap_vmcore()
   remap_vmalloc_range_partial()
  mmap_vmcore()

But there is no concurrency here, thus mmap_sem is not held.
After auditing the kernel, the following drivers use the fault
path and correctly set VM_MIXEDMAP):

.fault = etnaviv_gem_fault
.fault = udl_gem_fault
tegra_bo_fault()

As such, drop the reader trylock BUG_ON() for the common case.
This avoids having file_operations know about mmranges, as
mmap_sem is held during, mmap() for example.

Signed-off-by: Davidlohr Bueso 
---
 include/linux/huge_mm.h | 2 --
 mm/memory.c | 2 --
 mm/mmap.c   | 4 ++--
 mm/pagewalk.c   | 3 ---
 4 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 7cd5c150c21d..a4a9cfa78d8f 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -194,7 +194,6 @@ static inline int is_swap_pmd(pmd_t pmd)
 static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
struct vm_area_struct *vma)
 {
-   VM_BUG_ON_VMA(!rwsem_is_locked(>vm_mm->mmap_sem), vma);
if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
return __pmd_trans_huge_lock(pmd, vma);
else
@@ -203,7 +202,6 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd,
 static inline spinlock_t *pud_trans_huge_lock(pud_t *pud,
struct vm_area_struct *vma)
 {
-   VM_BUG_ON_VMA(!rwsem_is_locked(>vm_mm->mmap_sem), vma);
if (pud_trans_huge(*pud) || pud_devmap(*pud))
return __pud_trans_huge_lock(pud, vma);
else
diff --git a/mm/memory.c b/mm/memory.c
index 9516c95108a1..73971f859035 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1212,7 +1212,6 @@ static inline unsigned long zap_pud_range(struct 
mmu_gather *tlb,
next = pud_addr_end(addr, end);
if (pud_trans_huge(*pud) || pud_devmap(*pud)) {
if (next - addr != HPAGE_PUD_SIZE) {
-   
VM_BUG_ON_VMA(!rwsem_is_locked(>mm->mmap_sem), vma);
split_huge_pud(vma, pud, addr);
} else if (zap_huge_pud(tlb, vma, pud, addr))
goto next;
@@ -1519,7 +1518,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
long addr,
if (!page_count(page))
return -EINVAL;
if (!(vma->vm_flags & VM_MIXEDMAP)) {
-   BUG_ON(down_read_trylock(>vm_mm->mmap_sem));
BUG_ON(vma->vm_flags & VM_PFNMAP);
vma->vm_flags |= VM_MIXEDMAP;
}
diff --git a/mm/mmap.c b/mm/mmap.c
index af228ae3508d..a03ded49f9eb 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3466,7 +3466,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct 
anon_vma *anon_vma)
 * The LSB of head.next can't change from under us
 * because we hold the mm_all_locks_mutex.
 */
-   down_write_nest_lock(_vma->root->rwsem, >mmap_sem);
+   down_write(>mmap_sem);
/*
 * We can safely modify head.next after taking the
 * anon_vma->root->rwsem. If some other vma in this mm shares
@@ -3496,7 +3496,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct 
address_space *mapping)
 */
if (test_and_set_bit(AS_MM_ALL_LOCKS, >flags))
BUG();
-   down_write_nest_lock(>i_mmap_rwsem, >mmap_sem);
+   down_write(>mmap_sem);
}
 }
 
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index c3084ff2569d..6246acf17054 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -303,8 +303,6 @@ int walk_page_range(unsigned long start, unsigned long end,
if (!walk->mm)
return -EINVAL;
 
-   VM_BUG_ON_MM(!rwsem_is_locked(>mm->mmap_sem), walk->mm);
-
vma = find_vma(walk->mm, start);
do {
if (!vma) { /* after the last vma */
@@ -346,7 +344,6 @@ int walk_page_vma(struct vm_area_struct *vma, struct 
mm_walk *walk)
if (!walk->mm)
return -EINVAL;
 
-   VM_BUG_ON(!rwsem_is_locked(>mm->mmap_sem));
VM_BUG_ON(!vma);
walk->vma = vma;
err = walk_page_test(vma->vm_start, vma->vm_end, walk);
-- 
2.16.4

[PATCH 03/14] mm: introduce mm locking wrappers

2019-05-20 Thread Davidlohr Bueso

This patch adds the necessary wrappers to encapsulate mmap_sem
locking and will enable any future changes to be a lot more
confined to here. In addition, future users will incrementally
be added in the next patches. mm_[read/write]_[un]lock() naming
is used.

Signed-off-by: Davidlohr Bueso 
---
 include/linux/mm.h | 76 ++
 1 file changed, 76 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0e8834ac32b7..780b6097ee47 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2880,5 +2881,80 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+/*
+ * Address space locking wrappers.
+ */
+static inline bool mm_is_locked(struct mm_struct *mm,
+   struct range_lock *mmrange)
+{
+   return rwsem_is_locked(>mmap_sem);
+}
+
+/* Reader wrappers */
+static inline int mm_read_trylock(struct mm_struct *mm,
+ struct range_lock *mmrange)
+{
+   return down_read_trylock(>mmap_sem);
+}
+
+static inline void mm_read_lock(struct mm_struct *mm,
+   struct range_lock *mmrange)
+{
+   down_read(>mmap_sem);
+}
+
+static inline void mm_read_lock_nested(struct mm_struct *mm,
+  struct range_lock *mmrange, int subclass)
+{
+   down_read_nested(>mmap_sem, subclass);
+}
+
+static inline void mm_read_unlock(struct mm_struct *mm,
+ struct range_lock *mmrange)
+{
+   up_read(>mmap_sem);
+}
+
+/* Writer wrappers */
+static inline int mm_write_trylock(struct mm_struct *mm,
+  struct range_lock *mmrange)
+{
+   return down_write_trylock(>mmap_sem);
+}
+
+static inline void mm_write_lock(struct mm_struct *mm,
+struct range_lock *mmrange)
+{
+   down_write(>mmap_sem);
+}
+
+static inline int mm_write_lock_killable(struct mm_struct *mm,
+struct range_lock *mmrange)
+{
+   return down_write_killable(>mmap_sem);
+}
+
+static inline void mm_downgrade_write(struct mm_struct *mm,
+ struct range_lock *mmrange)
+{
+   downgrade_write(>mmap_sem);
+}
+
+static inline void mm_write_unlock(struct mm_struct *mm,
+  struct range_lock *mmrange)
+{
+   up_write(>mmap_sem);
+}
+
+static inline void mm_write_lock_nested(struct mm_struct *mm,
+   struct range_lock *mmrange,
+   int subclass)
+{
+   down_write_nested(>mmap_sem, subclass);
+}
+
+#define mm_write_nest_lock(mm, range, nest_lock)   \
+   down_write_nest_lock(&(mm)->mmap_sem, nest_lock)
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
-- 
2.16.4

[PATCH 04/14] mm: teach pagefault paths about range locking

2019-05-20 Thread Davidlohr Bueso

When handling a page fault, it happens that the mmap_sem is released
during the processing. As moving to range lock requires remembering
the range parameter to do the lock/unlock, this patch adds a pointer
to struct vm_fault. As such, we work outwards from arming the vmf from:

  handle_mm_fault(), __collapse_huge_page_swapin() and hugetlb_no_page()

The idea is to use a local, stack allocated variable (no concurrency)
whenever the mmap_sem is originally taken and we end up in pf paths that
end up retaking the lock. Ie:

  DEFINE_RANGE_LOCK_FULL(mmrange);

  down_write(>mmap_sem);
  some_fn(a, b, c, );
  
   
...
 handle_mm_fault(vma, addr, flags, mmrange);
...
  up_write(>mmap_sem);

Consequentially we also end up updating lock_page_or_retry(), which can
drop the mmap_sem.

For the the gup family, we pass nil for scenarios when the semaphore will
remain untouched.

Semantically nothing changes at all, and the 'mmrange' ends up
being unused for now. Later patches will use the variable when
the mmap_sem wrappers replace straightforward down/up.

*** For simplicity, this patch breaks when used in ksm and hmm. ***

Signed-off-by: Davidlohr Bueso 
---
 arch/x86/mm/fault.c | 27 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c |  2 +-
 drivers/infiniband/core/umem_odp.c  |  2 +-
 drivers/iommu/amd_iommu_v2.c|  3 +-
 drivers/iommu/intel-svm.c   |  3 +-
 drivers/vfio/vfio_iommu_type1.c |  2 +-
 fs/exec.c   |  2 +-
 include/linux/hugetlb.h |  9 +++--
 include/linux/mm.h  | 24 
 include/linux/pagemap.h |  6 +--
 kernel/events/uprobes.c |  7 ++--
 kernel/futex.c  |  2 +-
 mm/filemap.c|  2 +-
 mm/frame_vector.c   |  6 ++-
 mm/gup.c| 65 -
 mm/hmm.c|  4 +-
 mm/hugetlb.c| 14 ---
 mm/internal.h   |  3 +-
 mm/khugepaged.c | 24 +++-
 mm/ksm.c|  3 +-
 mm/memory.c | 14 ---
 mm/mempolicy.c  |  9 +++--
 mm/mmap.c   |  4 +-
 mm/mprotect.c   |  2 +-
 mm/process_vm_access.c  |  4 +-
 security/tomoyo/domain.c|  2 +-
 virt/kvm/async_pf.c |  3 +-
 virt/kvm/kvm_main.c |  9 +++--
 29 files changed, 159 insertions(+), 100 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 46df4c6aae46..fb869c292b91 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -938,7 +938,8 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long 
error_code,
 
 static void
 __bad_area(struct pt_regs *regs, unsigned long error_code,
-  unsigned long address, u32 pkey, int si_code)
+  unsigned long address, u32 pkey, int si_code,
+  struct range_lock *mmrange)
 {
struct mm_struct *mm = current->mm;
/*
@@ -951,9 +952,10 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
 }
 
 static noinline void
-bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address,
+struct range_lock *mmrange)
 {
-   __bad_area(regs, error_code, address, 0, SEGV_MAPERR);
+   __bad_area(regs, error_code, address, 0, SEGV_MAPERR, mmrange);
 }
 
 static inline bool bad_area_access_from_pkeys(unsigned long error_code,
@@ -975,7 +977,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long 
error_code,
 
 static noinline void
 bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
- unsigned long address, struct vm_area_struct *vma)
+ unsigned long address, struct vm_area_struct *vma,
+ struct range_lock *mmrange)
 {
/*
 * This OSPKE check is not strictly necessary at runtime.
@@ -1005,9 +1008,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long 
error_code,
 */
u32 pkey = vma_pkey(vma);
 
-   __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+   __bad_area(regs, error_code, address, pkey, SEGV_PKUERR, 
mmrange);
} else {
-   __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+   __bad_area(regs, error_code, address, 0, SEGV_ACCERR, mmrange);
}
 }
 
@@ -1306,6 +1309,7 @@ void do_user_addr_fault(struct pt_regs *regs,
struct mm_struct *mm;
vm_fault_t fault, major = 0;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
+

[PATCH 08/12] powerpc/pseries/svm: Export guest SVM status to user space via sysfs

2019-05-20 Thread Thiago Jung Bauermann

From: Ryan Grimm 

User space might want to know it's running in a secure VM.  It can't do
a mfmsr because mfmsr is a privileged instruction.

The solution here is to create a cpu attribute:

/sys/devices/system/cpu/svm

which will read 0 or 1 based on the S bit of the guest's CPU 0.

Signed-off-by: Ryan Grimm 
Reviewed-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/kernel/sysfs.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index e8e93c2c7d03..8fdab134e9ae 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "cacheinfo.h"
 #include "setup.h"
@@ -714,6 +715,32 @@ static struct device_attribute pa6t_attrs[] = {
 #endif /* HAS_PPC_PMC_PA6T */
 #endif /* HAS_PPC_PMC_CLASSIC */
 
+#ifdef CONFIG_PPC_SVM
+static void get_svm(void *val)
+{
+   u32 *value = val;
+
+   *value = is_secure_guest();
+}
+
+static ssize_t show_svm(struct device *dev, struct device_attribute *attr, 
char *buf)
+{
+   u32 val;
+   smp_call_function_single(0, get_svm, , 1);
+   return sprintf(buf, "%u\n", val);
+}
+static DEVICE_ATTR(svm, 0444, show_svm, NULL);
+
+static void create_svm_file(void)
+{
+   device_create_file(cpu_subsys.dev_root, _attr_svm);
+}
+#else
+static void create_svm_file(void)
+{
+}
+#endif /* CONFIG_PPC_SVM */
+
 static int register_cpu_online(unsigned int cpu)
 {
struct cpu *c = _cpu(cpu_devices, cpu);
@@ -1057,6 +1084,8 @@ static int __init topology_init(void)
sysfs_create_dscr_default();
 #endif /* CONFIG_PPC64 */
 
+   create_svm_file();
+
return 0;
 }
 subsys_initcall(topology_init);

[RFC PATCH 00/14] mmap_sem range locking

2019-05-20 Thread Davidlohr Bueso

Hi,

The following is a summarized repost of the range locking mmap_sem idea[1]
and is _not_ intended for being considered upstream as there are quite a few
issues that arise with this approach of tackling mmap_sem contention (keep 
reading).

In fact this patch is quite incomplete and will break compiling on anything
non-x86, and is also _completely broken_ for ksm and hmm.  That being said, this
does build an enterprise kernel and survives a number of workloads as well as
'runltp -f syscalls'. The previous series is a complete range locking 
conversion,
which ensured we had all the range locking apis we needed. The changelog also
included a number of performance numbers and overall design.

While finding issues with the code itself is always welcome, the idea of this 
series
is to discuss what can be done on top of it, if anything.

>From a locking pov, most recently there has been a revival in the interest of 
>the
range lock code for dchinner's plans of range locking the i_rwsem. However, it
showed that xfs's extent tree significantly outperformed[2] the (full) range 
lock.
The performance differences when doing 1:1 rwsem comparisons, have already been 
shown
in [1].

Considering both the range lock and the extent tree lock the whole tree, most 
of this
performance penalties are due to the fact that rbtrees' depth is a lot larger 
than
btree's, so the latter avoids most of the pointer chasing which is a common 
performance
issue. This was a trade-off for not having to allocate memory for the range 
nodes.

However, on the _positive side_, and which is what we care most about for 
mmap_sem,
when actually using the lock as intended, the range locking did show its 
purpose:

IOPS read/write (buffered IO)
fio processes   rwsem   rangelock
 1  57k / 57k   64k / 64k
 2  61k / 61k   111k / 111k
 4  61k / 61k   228k / 228k
 8  55k / 55k   195k / 195k
 16 15k / 15k40k /  40k

So it would be nice to apply this concept to our address space and allow mmaps, 
munmaps
and pagefaults to all work concurrently in non-overlapping scenarios -- which 
is what
is provided by userspace mm related syscalls. However, when using the range 
lock without
a full range, a number of issues around the vma immediately popup as a 
consequence of
this *top-down* approach to solving scalability:

Races within a vma: non-overlapping regions can still belong to the same vma, 
hence
wrecking merges and splits. One popular idea is to have a vma->rwsem (taken, 
for example,
after a find_vma()), however, this throws out the window any potential 
scalability gains
for large vmas as we just end up just moving down the point of contention. The 
same
problem occurs when refcouting the vma (such as with speculative pfs). There's 
also
the fact that we can end up taking numerous vma locks as the vma list is later 
traversed
once the first vma is found.

Alternatively, we could just expand the passed range such that it covers the 
whole first
and last vma(s) endpoints; of course we don't have that information aprori 
(protected by
mmap_sem :), and enlarging the range _after_ acquiring the lock opens a can of 
worms
because now we have to inform userspace and/or deadlock, among others.

Similarly, there's the issue of keeping the vma tree correct during 
modifications as well
as regular find_vma()s. Laurent has already pointed out that we have too many 
ways of
getting a vma: the tree, the list and the vmacache, all currently protected by 
mmap_sem
and breaks because of the above when not using full ranges. This also touches a 
bit in
a more *bottom-up* approach to mmap_sem performance, which scales from within, 
instead
of putting a big rangelock tree on top of the address space.

Matthew has pointed out a the xarray as well as an rcu based maple tree[3] 
replacement
of the rbtree, however we already have the vmacache so most of the benefits of 
a shallower
data structure are unnecessary, in cache-hot situations, naturally. The 
vma-list is easily
removable once we have O(1) next/prev pointers, which for rbtrees can be done 
via threading
the data structure (at the cost of extra branch for every level down the tree 
when
inserting). Maple trees already give us this. So all in all, if we were going 
to go down
this path of a cache friendlier tree, we'd end up needing comparisons of the 
maple tree vs
the current vmacache+rbtree combo. Regarding rcu-ifying the vma tree and 
replacing read
locking (and therefore plays nicer with cachelines), I sounds nice, it does not 
seem
practical considering that the page tables cannot be rcu-ified.

I'm sure I'm missing a lot more, but I'm hoping to kickstart the conversation 
again.

Patches 1-2: adds the range locking machinery. This is rebased on the rbtree 
optimizations
for interval trees such that

[PATCH 01/14] interval-tree: build unconditionally

2019-05-20 Thread Davidlohr Bueso

In preparation for range locking, this patch gets rid of
CONFIG_INTERVAL_TREE option as we will unconditionally
build it.

Signed-off-by: Davidlohr Bueso 
---
 drivers/gpu/drm/Kconfig  |  2 --
 drivers/gpu/drm/i915/Kconfig |  1 -
 drivers/iommu/Kconfig|  1 -
 lib/Kconfig  | 14 --
 lib/Kconfig.debug|  1 -
 lib/Makefile |  3 +--
 6 files changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e360a4a131e1..3405336175ed 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -200,7 +200,6 @@ config DRM_RADEON
select POWER_SUPPLY
select HWMON
select BACKLIGHT_CLASS_DEVICE
-   select INTERVAL_TREE
help
  Choose this option if you have an ATI Radeon graphics card.  There
  are both PCI and AGP versions.  You don't need to choose this to
@@ -220,7 +219,6 @@ config DRM_AMDGPU
select POWER_SUPPLY
select HWMON
select BACKLIGHT_CLASS_DEVICE
-   select INTERVAL_TREE
select CHASH
help
  Choose this option if you have a recent AMD Radeon graphics card.
diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 3d5f1cb6a76c..54d4bc8d141f 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -3,7 +3,6 @@ config DRM_I915
depends on DRM
depends on X86 && PCI
select INTEL_GTT
-   select INTERVAL_TREE
# we need shmfs for the swappable backing store, and in particular
# the shmem_readpage() which depends upon tmpfs
select SHMEM
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index a2ed2b51a0f7..d21e6dc2adae 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -477,7 +477,6 @@ config VIRTIO_IOMMU
depends on VIRTIO=y
depends on ARM64
select IOMMU_API
-   select INTERVAL_TREE
help
  Para-virtualised IOMMU driver with virtio.
 
diff --git a/lib/Kconfig b/lib/Kconfig
index 8d9239a4156c..e089ac40c062 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -409,20 +409,6 @@ config TEXTSEARCH_FSM
 config BTREE
bool
 
-config INTERVAL_TREE
-   bool
-   help
- Simple, embeddable, interval-tree. Can find the start of an
- overlapping range in log(n) time and then iterate over all
- overlapping nodes. The algorithm is implemented as an
- augmented rbtree.
-
- See:
-
-   Documentation/rbtree.txt
-
- for more information.
-
 config XARRAY_MULTI
bool
help
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4c35e52c5a2e..54bafed8ba70 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1759,7 +1759,6 @@ config RBTREE_TEST
 config INTERVAL_TREE_TEST
tristate "Interval tree test"
depends on DEBUG_KERNEL
-   select INTERVAL_TREE
help
  A benchmark measuring the performance of the interval tree library
 
diff --git a/lib/Makefile b/lib/Makefile
index fb7697031a79..39fd34156692 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -50,7 +50,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 percpu-refcount.o rhashtable.o \
 once.o refcount.o usercopy.o errseq.o bucket_locks.o \
-generic-radix-tree.o
+generic-radix-tree.o interval_tree.o
 obj-$(CONFIG_STRING_SELFTEST) += test_string.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
@@ -115,7 +115,6 @@ obj-y += logic_pio.o
 obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o
 
 obj-$(CONFIG_BTREE) += btree.o
-obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o
 obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o
 obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o
 obj-$(CONFIG_DEBUG_LIST) += list_debug.o
-- 
2.16.4

[PATCH 05/12] powerpc/pseries: Add and use LPPACA_SIZE constant

2019-05-20 Thread Thiago Jung Bauermann

Helps document what the hard-coded number means.

Also take the opportunity to fix an #endif comment.

Suggested-by: Alexey Kardashevskiy 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/kernel/paca.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 9cc91d03ab62..854105db5cff 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -56,6 +56,8 @@ static void *__init alloc_paca_data(unsigned long size, 
unsigned long align,
 
 #ifdef CONFIG_PPC_PSERIES
 
+#define LPPACA_SIZE 0x400
+
 /*
  * See asm/lppaca.h for more detail.
  *
@@ -69,7 +71,7 @@ static inline void init_lppaca(struct lppaca *lppaca)
 
*lppaca = (struct lppaca) {
.desc = cpu_to_be32(0xd397d781),/* "LpPa" */
-   .size = cpu_to_be16(0x400),
+   .size = cpu_to_be16(LPPACA_SIZE),
.fpregs_in_use = 1,
.slb_count = cpu_to_be16(64),
.vmxregs_in_use = 0,
@@ -79,19 +81,18 @@ static inline void init_lppaca(struct lppaca *lppaca)
 static struct lppaca * __init new_lppaca(int cpu, unsigned long limit)
 {
struct lppaca *lp;
-   size_t size = 0x400;
 
-   BUILD_BUG_ON(size < sizeof(struct lppaca));
+   BUILD_BUG_ON(sizeof(struct lppaca) > LPPACA_SIZE);
 
if (early_cpu_has_feature(CPU_FTR_HVMODE))
return NULL;
 
-   lp = alloc_paca_data(size, 0x400, limit, cpu);
+   lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
init_lppaca(lp);
 
return lp;
 }
-#endif /* CONFIG_PPC_BOOK3S */
+#endif /* CONFIG_PPC_PSERIES */
 
 #ifdef CONFIG_PPC_BOOK3S_64

[PATCH 04/12] powerpc/pseries/svm: Add helpers for UV_SHARE_PAGE and UV_UNSHARE_PAGE

2019-05-20 Thread Thiago Jung Bauermann

From: Ram Pai 

These functions are used when the guest wants to grant the hypervisor
access to certain pages.

Signed-off-by: Ram Pai 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/ultravisor-api.h |  2 ++
 arch/powerpc/include/asm/ultravisor.h | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 0e8b72081718..ed68b02869fd 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -20,6 +20,8 @@
 /* opcodes */
 #define UV_WRITE_PATE  0xF104
 #define UV_ESM 0xF110
+#define UV_SHARE_PAGE  0xF130
+#define UV_UNSHARE_PAGE0xF134
 #define UV_RETURN  0xF11C
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/include/asm/ultravisor.h 
b/arch/powerpc/include/asm/ultravisor.h
index 09e0a615d96f..537f7717d21a 100644
--- a/arch/powerpc/include/asm/ultravisor.h
+++ b/arch/powerpc/include/asm/ultravisor.h
@@ -44,6 +44,20 @@ static inline int uv_register_pate(u64 lpid, u64 dw0, u64 
dw1)
return ucall(UV_WRITE_PATE, retbuf, lpid, dw0, dw1);
 }
 
+static inline int uv_share_page(u64 pfn, u64 npages)
+{
+   unsigned long retbuf[UCALL_BUFSIZE];
+
+   return ucall(UV_SHARE_PAGE, retbuf, pfn, npages);
+}
+
+static inline int uv_unshare_page(u64 pfn, u64 npages)
+{
+   unsigned long retbuf[UCALL_BUFSIZE];
+
+   return ucall(UV_UNSHARE_PAGE, retbuf, pfn, npages);
+}
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_H */

[PATCH 06/12] powerpc/pseries/svm: Use shared memory for LPPACA structures

2019-05-20 Thread Thiago Jung Bauermann

From: Anshuman Khandual 

LPPACA structures need to be shared with the host. Hence they need to be in
shared memory. Instead of allocating individual chunks of memory for a
given structure from memblock, a contiguous chunk of memory is allocated
and then converted into shared memory. Subsequent allocation requests will
come from the contiguous chunk which will be always shared memory for all
structures.

While we are able to use a kmem_cache constructor for the Debug Trace Log,
LPPACAs are allocated very early in the boot process (before SLUB is
available) so we need to use a simpler scheme here.

Introduce helper is_svm_platform() which uses the S bit of the MSR to tell
whether we're running as a secure guest.

Signed-off-by: Anshuman Khandual 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/svm.h | 26 
 arch/powerpc/kernel/paca.c | 43 +-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h
new file mode 100644
index ..fef3740f46a6
--- /dev/null
+++ b/arch/powerpc/include/asm/svm.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+/*
+ * SVM helper functions
+ *
+ * Copyright 2019 Anshuman Khandual, IBM Corporation.
+ */
+
+#ifndef _ASM_POWERPC_SVM_H
+#define _ASM_POWERPC_SVM_H
+
+#ifdef CONFIG_PPC_SVM
+
+static inline bool is_secure_guest(void)
+{
+   return mfmsr() & MSR_S;
+}
+
+#else /* CONFIG_PPC_SVM */
+
+static inline bool is_secure_guest(void)
+{
+   return false;
+}
+
+#endif /* CONFIG_PPC_SVM */
+#endif /* _ASM_POWERPC_SVM_H */
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 854105db5cff..a9622f4b45bb 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "setup.h"
 
@@ -58,6 +60,41 @@ static void *__init alloc_paca_data(unsigned long size, 
unsigned long align,
 
 #define LPPACA_SIZE 0x400
 
+static void *__init alloc_shared_lppaca(unsigned long size, unsigned long 
align,
+   unsigned long limit, int cpu)
+{
+   size_t shared_lppaca_total_size = PAGE_ALIGN(nr_cpu_ids * LPPACA_SIZE);
+   static unsigned long shared_lppaca_size;
+   static void *shared_lppaca;
+   void *ptr;
+
+   if (!shared_lppaca) {
+   memblock_set_bottom_up(true);
+
+   shared_lppaca =
+   memblock_alloc_try_nid(shared_lppaca_total_size,
+  PAGE_SIZE, MEMBLOCK_LOW_LIMIT,
+  limit, NUMA_NO_NODE);
+   if (!shared_lppaca)
+   panic("cannot allocate shared data");
+
+   memblock_set_bottom_up(false);
+   uv_share_page(PHYS_PFN(__pa(shared_lppaca)),
+ shared_lppaca_total_size >> PAGE_SHIFT);
+   }
+
+   ptr = shared_lppaca + shared_lppaca_size;
+   shared_lppaca_size += size;
+
+   /*
+* This is very early in boot, so no harm done if the kernel crashes at
+* this point.
+*/
+   BUG_ON(shared_lppaca_size >= shared_lppaca_total_size);
+
+   return ptr;
+}
+
 /*
  * See asm/lppaca.h for more detail.
  *
@@ -87,7 +124,11 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
if (early_cpu_has_feature(CPU_FTR_HVMODE))
return NULL;
 
-   lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
+   if (is_secure_guest())
+   lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu);
+   else
+   lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu);
+
init_lppaca(lp);
 
return lp;

[PATCH 00/12] Secure Virtual Machine Enablement

2019-05-20 Thread Thiago Jung Bauermann

This series enables Secure Virtual Machines (SVMs) on powerpc. SVMs use the
Protected Execution Facility (PEF) and request to be migrated to secure
memory during prom_init() so by default all of their memory is inaccessible
to the hypervisor. There is an Ultravisor call that the VM can use to
request certain pages to be made accessible to (or shared with) the
hypervisor.

The objective of these patches is to have the guest perform this request
for buffers that need to be accessed by the hypervisor such as the LPPACAs,
the SWIOTLB memory and the Debug Trace Log.

The patch set applies on top of Claudio Carvalho's "kvmppc: Paravirtualize KVM
to support ultravisor" series:

https://lore.kernel.org/linuxppc-dev/20190518142524.28528-1-cclau...@linux.ibm.com/

I only need the following two patches from his series:

[RFC PATCH v2 02/10] KVM: PPC: Ultravisor: Introduce the MSR_S bit
[RFC PATCH v2 04/10] KVM: PPC: Ultravisor: Add generic ultravisor call handler

Patches 2 and 3 are posted as RFC because we are still finalizing the
details on how the ESM blob will be passed to the kernel. All other patches
are (hopefully) in upstreamable shape.

Unfortunately this series still doesn't enable the use of virtio devices in
the secure guest. This support depends on a discussion that is currently
ongoing with the virtio community:

https://lore.kernel.org/linuxppc-dev/87womn8inf.fsf@morokweng.localdomain/

This was the last time I posted this patch set:

https://lore.kernel.org/linuxppc-dev/20180824162535.22798-1-bauer...@linux.ibm.com/

At that time, it wasn't possible to launch a real secure guest because the
Ultravisor was still in very early development. Now there is a relatively
mature Ultravisor and I was able to test it using Claudio's patches in the
host kernel, booting normally using an initramfs for the root filesystem.

This is the command used to start up the guest with QEMU 4.0:

qemu-system-ppc64   \
-nodefaults \
-cpu host   \
-machine 
pseries,accel=kvm,kvm-type=HV,cap-htm=off,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken
 \
-display none   \
-serial mon:stdio   \
-smp 1  \
-m 4G   \
-kernel /home/bauermann/vmlinux \
-initrd /home/bauermann/fs_small.cpio   \
-append 'debug'

Changelog since the RFC from August:

- Patch "powerpc/pseries: Introduce option to build secure virtual machines"
  - New patch.

- Patch "powerpc: Add support for adding an ESM blob to the zImage wrapper"
  - Patch from Benjamin Herrenschmidt, first posted here:

https://lore.kernel.org/linuxppc-dev/20180531043417.25073-1-b...@kernel.crashing.org/
  - Made minor adjustments to some comments. Code is unchanged.

- Patch "powerpc/prom_init: Add the ESM call to prom_init"
  - New patch from Ram Pai and Michael Anderson.

- Patch "powerpc/pseries/svm: Add helpers for UV_SHARE_PAGE and UV_UNSHARE_PAGE"
  - New patch from Ram Pai.

- Patch "powerpc/pseries: Add and use LPPACA_SIZE constant"
  - Moved LPPACA_SIZE macro inside the CONFIG_PPC_PSERIES #ifdef.
  - Put sizeof() operand left of comparison operator in BUILD_BUG_ON()
macro to appease a checkpatch warning.

- Patch "powerpc/pseries/svm: Use shared memory for LPPACA structures"
  - Moved definition of is_secure_guest() helper to this patch.
  - Changed shared_lppaca and shared_lppaca_size from globals to static
variables inside alloc_shared_lppaca().
  - Changed shared_lppaca to hold virtual address instead of physical
address.

- Patch "powerpc/pseries/svm: Use shared memory for Debug Trace Log (DTL)"
  - Add get_dtl_cache_ctor() macro. Suggested by Ram Pai.

- Patch "powerpc/pseries/svm: Export guest SVM status to user space via sysfs"
  - New patch from Ryan Grimm.

- Patch "powerpc/pseries/svm: Disable doorbells in SVM guests"
  - New patch from Sukadev Bhattiprolu.

- Patch "powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests"
  - New patch.

- Patch "powerpc/pseries/svm: Force SWIOTLB for secure guests"
  - New patch with code that was previously in other patches.

- Patch "powerpc/configs: Enable secure guest support in pseries and ppc64 
defconfigs"
  - New patch from Ryan Grimm.

- Patch "powerpc/pseries/svm: Detect Secure Virtual Machine (SVM) platform"
  - Dropped this patch by moving its code to other patches.

- Patch "powerpc/svm: Select CONFIG_DMA_DIRECT_OPS and CONFIG_SWIOTLB"
  - No need to select CONFIG_DMA_DIRECT_OPS anymore. The CONFIG_SWIOTLB
change was moved to another patch and this patch was dropped.

- Patch "powerpc/pseries/svm: Add memory conversion (shared/secure) helper 
functions"
  - Dropped patch since the helper functions were unnecessary wrappers
around uv_share_page() and uv_unshare_page().

- Patch "powerpc/svm: Convert SWIOTLB buffers

[RFC PATCH 03/12] powerpc/prom_init: Add the ESM call to prom_init

2019-05-20 Thread Thiago Jung Bauermann

From: Ram Pai 

Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure
mode. Add "svm=" command line option to turn off switching to secure mode.
Introduce CONFIG_PPC_SVM to control support for secure guests.

Signed-off-by: Ram Pai 
[ Generate an RTAS os-term hcall when the ESM ucall fails. ]
Signed-off-by: Michael Anderson 
[ Cleaned up the code a bit. ]
Signed-off-by: Thiago Jung Bauermann 
---
 .../admin-guide/kernel-parameters.txt |   5 +
 arch/powerpc/include/asm/ultravisor-api.h |   1 +
 arch/powerpc/kernel/prom_init.c   | 124 ++
 3 files changed, 130 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index c45a19d654f3..7237d86b25c6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4501,6 +4501,11 @@
/sys/power/pm_test). Only available when CONFIG_PM_DEBUG
is set. Default value is 5.
 
+   svm=[PPC]
+   Format: { on | off | y | n | 1 | 0 }
+   This parameter controls use of the Protected
+   Execution Facility on pSeries.
+
swapaccount=[0|1]
[KNL] Enable accounting of swap in memory resource
controller if no parameter or 1 is given or disable
diff --git a/arch/powerpc/include/asm/ultravisor-api.h 
b/arch/powerpc/include/asm/ultravisor-api.h
index 15e6ce77a131..0e8b72081718 100644
--- a/arch/powerpc/include/asm/ultravisor-api.h
+++ b/arch/powerpc/include/asm/ultravisor-api.h
@@ -19,6 +19,7 @@
 
 /* opcodes */
 #define UV_WRITE_PATE  0xF104
+#define UV_ESM 0xF110
 #define UV_RETURN  0xF11C
 
 #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 523bb99d7676..5d8a3efb54f2 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -174,6 +175,10 @@ static unsigned long __prombss prom_tce_alloc_end;
 static bool __prombss prom_radix_disable;
 #endif
 
+#ifdef CONFIG_PPC_SVM
+static bool __prombss prom_svm_disable;
+#endif
+
 struct platform_support {
bool hash_mmu;
bool radix_mmu;
@@ -809,6 +814,17 @@ static void __init early_cmdline_parse(void)
if (prom_radix_disable)
prom_debug("Radix disabled from cmdline\n");
 #endif /* CONFIG_PPC_PSERIES */
+
+#ifdef CONFIG_PPC_SVM
+   opt = prom_strstr(prom_cmd_line, "svm=");
+   if (opt) {
+   bool val;
+
+   opt += sizeof("svm=") - 1;
+   if (!prom_strtobool(opt, ))
+   prom_svm_disable = !val;
+   }
+#endif /* CONFIG_PPC_SVM */
 }
 
 #ifdef CONFIG_PPC_PSERIES
@@ -1707,6 +1723,43 @@ static void __init prom_close_stdin(void)
}
 }
 
+#ifdef CONFIG_PPC_SVM
+static int prom_rtas_os_term_hcall(uint64_t args)
+{
+   register uint64_t arg1 asm("r3") = 0xf000;
+   register uint64_t arg2 asm("r4") = args;
+
+   asm volatile("sc 1\n" : "=r" (arg1) :
+   "r" (arg1),
+   "r" (arg2) :);
+   return arg1;
+}
+
+static struct rtas_args __prombss os_term_args;
+
+static void __init prom_rtas_os_term(char *str)
+{
+   phandle rtas_node;
+   __be32 val;
+   u32 token;
+
+   prom_printf("%s: start...\n", __func__);
+   rtas_node = call_prom("finddevice", 1, 1, ADDR("/rtas"));
+   prom_printf("rtas_node: %x\n", rtas_node);
+   if (!PHANDLE_VALID(rtas_node))
+   return;
+
+   val = 0;
+   prom_getprop(rtas_node, "ibm,os-term", , sizeof(val));
+   token = be32_to_cpu(val);
+   prom_printf("ibm,os-term: %x\n", token);
+   if (token == 0)
+   prom_panic("Could not get token for ibm,os-term\n");
+   os_term_args.token = cpu_to_be32(token);
+   prom_rtas_os_term_hcall((uint64_t)_term_args);
+}
+#endif /* CONFIG_PPC_SVM */
+
 /*
  * Allocate room for and instantiate RTAS
  */
@@ -3162,6 +3215,74 @@ static void unreloc_toc(void)
 #endif
 #endif
 
+#ifdef CONFIG_PPC_SVM
+/*
+ * The ESM blob is a data structure with information needed by the Ultravisor 
to
+ * validate the integrity of the secure guest.
+ */
+static void *get_esm_blob(void)
+{
+   /*
+* FIXME: We are still finalizing the details on how prom_init will grab
+* the ESM blob. When that is done, this function will be updated.
+*/
+   return (void *)0xdeadbeef;
+}
+
+/*
+ * Perform the Enter Secure Mode ultracall.
+ */
+static int enter_secure_mode(void *esm_blob, void *retaddr, void *fdt)
+{
+   register uint64_t func asm("r0") = UV_ESM;
+   register uint64_t arg1 asm("r3") = (uint64_t)esm_blob;
+   register uint64_t arg2

Re: linux-next: build failure after merge of the imx-mxs tree

2019-05-20 Thread Shawn Guo

On Tue, May 21, 2019 at 02:16:47AM +, Anson Huang wrote:
> Hi, Stephen/Shawn
>   I realized this issue last week when I updated my Linux-next tree (NOT 
> sure why I did NOT meet such issue when I did the patch), so I resent the 
> patch series of adding head file "io.h" to fix this issue, please apply below 
> V2 patch series instead, sorry for the inconvenience.
> 
> https://patchwork.kernel.org/patch/10944681/

Okay, fixed.  Sorry for the breakage, Stephen.

Shawn

> > -Original Message-
> > From: Stephen Rothwell [mailto:s...@canb.auug.org.au]
> > Sent: Tuesday, May 21, 2019 6:38 AM
> > To: Shawn Guo 
> > Cc: Linux Next Mailing List ; Linux Kernel 
> > Mailing
> > List ; Anson Huang ;
> > Aisheng Dong 
> > Subject: linux-next: build failure after merge of the imx-mxs tree
> > 
> > Hi Shawn,
> > 
> > After merging the imx-mxs tree, today's linux-next build (arm
> > multi_v7_defconfig) failed like this:
> > 
> > drivers/clk/imx/clk.c: In function 'imx_mmdc_mask_handshake':
> > drivers/clk/imx/clk.c:20:8: error: implicit declaration of function
> > 'readl_relaxed'; did you mean 'xchg_relaxed'? [-Werror=implicit-function-
> > declaration]
> >   reg = readl_relaxed(ccm_base + CCM_CCDR);
> > ^
> > xchg_relaxed
> > drivers/clk/imx/clk.c:22:2: error: implicit declaration of function
> > 'writel_relaxed'; did you mean 'xchg_relaxed'? [-Werror=implicit-function-
> > declaration]
> >   writel_relaxed(reg, ccm_base + CCM_CCDR);
> >   ^~~~~~
> >   xchg_relaxed
> > 
> > Caused by commit
> > 
> >   0dc6b492b6e0 ("clk: imx: Add common API for masking MMDC handshake")
> > 
> > I have used the imx-mxs tree from next-20190520 for today.
> > 
> > --
> > Cheers,
> > Stephen Rothwell

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim

On Mon, May 20, 2019 at 12:46:05PM -0400, Johannes Weiner wrote:
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > - Approach
> > 
> > The approach we chose was to use a new interface to allow userspace to
> > proactively reclaim entire processes by leveraging platform information.
> > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > that are known to be cold from userspace and to avoid races with lmkd
> > by reclaiming apps as soon as they entered the cached state. Additionally,
> > it could provide many chances for platform to use much information to
> > optimize memory efficiency.
> > 
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> 
> I agree with this approach and the semantics. But these names are very
> vague and extremely easy to confuse since they're so similar.
> 
> MADV_COLD could be a good name, but for deactivating pages, not
> reclaiming them - marking memory "cold" on the LRU for later reclaim.
> 
> For the immediate reclaim one, I think there is a better option too:
> In virtual memory speak, putting a page into secondary storage (or
> ensuring it's already there), and then freeing its in-memory copy, is
> called "paging out". And that's what this flag is supposed to do. So
> how about MADV_PAGEOUT?
> 
> With that, we'd have:
> 
> MADV_FREE: Mark data invalid, free memory when needed
> MADV_DONTNEED: Mark data invalid, free memory immediately
> 
> MADV_COLD: Data is not used for a while, free memory when needed
> MADV_PAGEOUT: Data is not used for a while, free memory immediately
> 
> What do you think?

There are several suggestions until now. Thanks, Folks!

For deactivating:

- MADV_COOL
- MADV_RECLAIM_LAZY
- MADV_DEACTIVATE
- MADV_COLD
- MADV_FREE_PRESERVE


For reclaiming:

- MADV_COLD
- MADV_RECLAIM_NOW
- MADV_RECLAIMING
- MADV_PAGEOUT
- MADV_DONTNEED_PRESERVE

It seems everybody doesn't like MADV_COLD so want to go with other.
For consisteny of view with other existing hints of madvise, -preserve
postfix suits well. However, originally, I don't like the naming FREE
vs DONTNEED from the beginning. They were easily confused.
I prefer PAGEOUT to RECLAIM since it's more likely to be nuance to
represent reclaim with memory pressure and is supposed to paged-in
if someone need it later. So, it imply PRESERVE.
If there is not strong against it, I want to go with MADV_COLD and
MADV_PAGEOUT.

Other opinion?

Re: [PATCH] dmaengine: stm32-dma: Fix redundant call to platform_get_irq

2019-05-20 Thread Vinod Koul

On 07-05-19, 09:54, Amelie Delaunay wrote:
> Commit c6504be53972 ("dmaengine: stm32-dma: Fix unsigned variable compared
> with zero") duplicated the call to platform_get_irq.
> So remove the first call to platform_get_irq.

Applied, thanks

-- 
~Vinod

Re: [PATCH 3/4] dmaengine: fsl-edma: support little endian for edma driver

2019-05-20 Thread Vinod Koul

On 06-05-19, 09:03, Peng Ma wrote:
> improve edma driver to support little endian.

Can you explain a bit more how adding the below lines adds little endian
support...

> 
> Signed-off-by: Peng Ma 
> ---
>  drivers/dma/fsl-edma-common.c |5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
> index 680b2a0..6bf238e 100644
> --- a/drivers/dma/fsl-edma-common.c
> +++ b/drivers/dma/fsl-edma-common.c
> @@ -83,9 +83,14 @@ void fsl_edma_chan_mux(struct fsl_edma_chan *fsl_chan,
>   u32 ch = fsl_chan->vchan.chan.chan_id;
>   void __iomem *muxaddr;
>   unsigned int chans_per_mux, ch_off;
> + int endian_diff[4] = {3, 1, -1, -3};
>  
>   chans_per_mux = fsl_chan->edma->n_chans / DMAMUX_NR;
>   ch_off = fsl_chan->vchan.chan.chan_id % chans_per_mux;
> +
> + if (!fsl_chan->edma->big_endian)
> + ch_off += endian_diff[ch_off % 4];
> +
>   muxaddr = fsl_chan->edma->muxbase[ch / chans_per_mux];
>   slot = EDMAMUX_CHCFG_SOURCE(slot);
>  
> -- 
> 1.7.1

-- 
~Vinod

Re: [V2 2/2] dmaengine: fsl-qdma: Add improvement

2019-05-20 Thread Vinod Koul

On 06-05-19, 10:21, Peng Ma wrote:
> When an error occurs we should clean the error register then to return

Applied, thanks

-- 
~Vinod

Re: [V2 1/2] dmaengine: fsl-qdma: fixed the source/destination descriptor format

2019-05-20 Thread Vinod Koul

On 06-05-19, 10:21, Peng Ma wrote:
> CMD of Source/Destination descriptor format should be lower of
> struct fsl_qdma_engine number data address.
> 
> Signed-off-by: Peng Ma 
> ---
> changed for V2:
>   - Fix descriptor spelling
> 
>  drivers/dma/fsl-qdma.c |   25 +
>  1 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c
> index aa1d0ae..2e8b46b 100644
> --- a/drivers/dma/fsl-qdma.c
> +++ b/drivers/dma/fsl-qdma.c
> @@ -113,6 +113,7 @@
>  /* Field definition for Descriptor offset */
>  #define QDMA_CCDF_STATUS 20
>  #define QDMA_CCDF_OFFSET 20
> +#define QDMA_SDDF_CMD(x) (((u64)(x)) << 32)
>  
>  /* Field definition for safe loop count*/
>  #define FSL_QDMA_HALT_COUNT  1500
> @@ -214,6 +215,12 @@ struct fsl_qdma_engine {
>  
>  };
>  
> +static inline void
> +qdma_sddf_set_cmd(struct fsl_qdma_format *sddf, u32 val)
> +{
> + sddf->data = QDMA_SDDF_CMD(val);
> +}

Do you really need this helper which calls another macro!

> +
>  static inline u64
>  qdma_ccdf_addr_get64(const struct fsl_qdma_format *ccdf)
>  {
> @@ -341,6 +348,7 @@ static void fsl_qdma_free_chan_resources(struct dma_chan 
> *chan)
>  static void fsl_qdma_comp_fill_memcpy(struct fsl_qdma_comp *fsl_comp,
> dma_addr_t dst, dma_addr_t src, u32 len)
>  {
> + u32 cmd;
>   struct fsl_qdma_format *sdf, *ddf;
>   struct fsl_qdma_format *ccdf, *csgf_desc, *csgf_src, *csgf_dest;
>  
> @@ -353,6 +361,7 @@ static void fsl_qdma_comp_fill_memcpy(struct 
> fsl_qdma_comp *fsl_comp,
>  
>   memset(fsl_comp->virt_addr, 0, FSL_QDMA_COMMAND_BUFFER_SIZE);
>   memset(fsl_comp->desc_virt_addr, 0, FSL_QDMA_DESCRIPTOR_BUFFER_SIZE);
> +

why did you add a blank line in this 'fix', it does not belong here!

>   /* Head Command Descriptor(Frame Descriptor) */
>   qdma_desc_addr_set64(ccdf, fsl_comp->bus_addr + 16);
>   qdma_ccdf_set_format(ccdf, qdma_ccdf_get_offset(ccdf));
> @@ -369,14 +378,14 @@ static void fsl_qdma_comp_fill_memcpy(struct 
> fsl_qdma_comp *fsl_comp,
>   /* This entry is the last entry. */
>   qdma_csgf_set_f(csgf_dest, len);
>   /* Descriptor Buffer */
> - sdf->data =
> - cpu_to_le64(FSL_QDMA_CMD_RWTTYPE <<
> - FSL_QDMA_CMD_RWTTYPE_OFFSET);
> - ddf->data =
> - cpu_to_le64(FSL_QDMA_CMD_RWTTYPE <<
> - FSL_QDMA_CMD_RWTTYPE_OFFSET);
> - ddf->data |=
> - cpu_to_le64(FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET);
> + cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
> +   FSL_QDMA_CMD_RWTTYPE_OFFSET);
> + qdma_sddf_set_cmd(sdf, cmd);

why not do sddf->data = QDMA_SDDF_CMD(cmd);

> +
> + cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE <<
> +   FSL_QDMA_CMD_RWTTYPE_OFFSET);
> + cmd |= cpu_to_le32(FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET);
> + qdma_sddf_set_cmd(ddf, cmd);
>  }
>  
>  /*
> -- 
> 1.7.1

-- 
~Vinod

Re: [RFC PATCH] powerpc/mm: Implement STRICT_MODULE_RWX

2019-05-20 Thread Russell Currey

On Wed, 2019-05-15 at 06:20 +, Christophe Leroy wrote:

Confirming this works on hash and radix book3s64.

> +
> + // only operate on VM areas for now
> + area = find_vm_area((void *)addr);
> + if (!area || end > (unsigned long)area->addr + area->size ||
> + !(area->flags & VM_ALLOC))
> + return -EINVAL;

https://lore.kernel.org/patchwork/project/lkml/list/?series=391470

With this patch, the above series causes crashes on (at least) Hash,
since it adds another user of change_page_rw() and change_page_nx()
that for reasons I don't understand yet, we can't handle.  I can work
around this with:

if (area->flags & VM_FLUSH_RESET_PERMS)
return 0;

so this is broken on at least one platform as of 5.2-rc1.  We're going
to look into this more to see if there's anything else we have to do as
a result of this series before the next merge window, or if just
working around it like this is good enough.

- Russell

Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c

2019-05-20 Thread Nicolas Pitre

On Tue, 21 May 2019, Gen Zhang wrote:

> On Mon, May 20, 2019 at 11:26:20PM -0400, Nicolas Pitre wrote:
> > On Tue, 21 May 2019, Gen Zhang wrote:
> > 
> > > On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote:
> > > > On Tue, 21 May 2019, Gen Zhang wrote:
> > > > 
> > > > > In function con_init(), the pointer variable vc_cons[currcons].d, vc 
> > > > > and
> > > > > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they 
> > > > > are
> > > > > used in the following codes.
> > > > > However, when there is a memory allocation error, kzalloc() can fail.
> > > > > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf)
> > > > > dereference may happen. And it will cause the kernel to crash. 
> > > > > Therefore,
> > > > > we should check return value and handle the error.
> > > > > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in
> > > > > include/uapi/linux/vt.h. So there is no need to unwind the loop.
> > > > 
> > > > But what if someone changes that define? It won't be obvious that some 
> > > > code did rely on it to be defined to 1.
> > > I re-examine the source code. MIN_NR_CONSOLES is only defined once and
> > > no other changes to it.
> > 
> > Yes, that is true today.  But if someone changes that in the future, how 
> > will that person know that you relied on it to be 1 for not needing to 
> > unwind the loop?
> > 
> > 
> > Nicolas
> Hi Nicolas,
> Thanks for your explaination! And I got your point. And is this way 
> proper?

Not quite.

> err_vc_screenbuf:
> kfree(vc);
>   for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++)
>   vc_cons[currcons].d = NULL;
>   return -ENOMEM;
> err_vc:
>   console_unlock();
>   return -ENOMEM;

Now imagine that MIN_NR_CONSOLES is defined to 10 instead of 1.

What happens with allocated memory if the err_vc condition is met on the 
5th loop?

If err_vc_screenbuf condition is encountered on the 5th loop (curcons = 
4), what is the value of vc_cons[4].d? Isn't it the same as vc that you 
just freed?


Nicolas

Re: [PATCH] dmaengine: jz4780: Fix transfers being ACKed too soon

2019-05-20 Thread Vinod Koul

On 04-05-19, 23:37, Paul Cercueil wrote:
> When a multi-descriptor DMA transfer is in progress, the "IRQ pending"
> flag will apparently be set for that channel as soon as the last
> descriptor loads, way before the IRQ actually happens. This behaviour
> has been observed on the JZ4725B, but maybe other SoCs are affected.
> 
> In the case where another DMA transfer is running into completion on a
> separate channel, the IRQ handler would then run the completion handler
> for our previous channel even if the transfer didn't actually finish.
> 
> Fix this by checking in the completion handler that we're indeed done;
> if not the interrupted DMA transfer will simply be resumed.

Applied, thanks

-- 
~Vinod

Re: [PATCH] dmaengine: jz4780: Use SPDX license notifier

2019-05-20 Thread Vinod Koul

On 04-05-19, 23:34, Paul Cercueil wrote:
> Use SPDX license notifier instead of plain text in the header.

Applied, thanks

-- 
~Vinod

Re: [RFC PATCH 07/11] bpf: implement writable buffers in contexts

2019-05-20 Thread Kris Van Hees

On Mon, May 20, 2019 at 09:21:34PM -0400, Steven Rostedt wrote:
> Hi Kris,
> 
> Note, it's best to thread patches. Otherwise they get spread out in
> mail boxes and hard to manage. That is, every patch should be a reply
> to the 00/11 header patch.

Thanks for that advice - I will make sure to do that for future postings.

> Also, Peter Ziljstra (Cc'd) is the maintainer of perf on the kernel
> side. Please include him on Ccing perf changes that are done inside the
> kernel.

Ah, my apologies for missing Peter in the list of Cc's.  Thank you for adding
him.  I will update my list.

Kris

> On Mon, 20 May 2019 23:52:24 + (UTC)
> Kris Van Hees  wrote:
> 
> > Currently, BPF supports writes to packet data in very specific cases.
> > The implementation can be of more general use and can be extended to any
> > number of writable buffers in a context.  The implementation adds two new
> > register types: PTR_TO_BUFFER and PTR_TO_BUFFER_END, similar to the types
> > PTR_TO_PACKET and PTR_TO_PACKET_END.  In addition, a field 'buf_id' is
> > added to the reg_state structure as a way to distinguish between different
> > buffers in a single context.
> > 
> > Buffers are specified in the context by a pair of members:
> > - a pointer to the start of the buffer (type PTR_TO_BUFFER)
> > - a pointer to the first byte beyond the buffer (type PTR_TO_BUFFER_END)
> > 
> > A context can contain multiple buffers.  Each buffer/buffer_end pair is
> > identified by a unique id (buf_id).  The start-of-buffer member offset is
> > usually a good unique identifier.
> > 
> > The semantics for using a writable buffer are the same as for packet data.
> > The BPF program must contain a range test (buf + num > buf_end) to ensure
> > that the verifier can verify that offsets are within the allowed range.
> > 
> > Whenever a helper is called that might update the content of the context
> > all range information for registers that hold pointers to a buffer is
> > cleared, just as it is done for packet pointers.
> > 
> > Signed-off-by: Kris Van Hees 
> > Reviewed-by: Nick Alcock 
> > ---
> >  include/linux/bpf.h  |   3 +
> >  include/linux/bpf_verifier.h |   4 +-
> >  kernel/bpf/verifier.c| 198 ---
> >  3 files changed, 145 insertions(+), 60 deletions(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index e4bcb79656c4..fc3eda0192fb 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -275,6 +275,8 @@ enum bpf_reg_type {
> > PTR_TO_TCP_SOCK, /* reg points to struct tcp_sock */
> > PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> > PTR_TO_TP_BUFFER,/* reg points to a writable raw tp's buffer */
> > +   PTR_TO_BUFFER,   /* reg points to ctx buffer */
> > +   PTR_TO_BUFFER_END,   /* reg points to ctx buffer end */
> >  };
> >  
> >  /* The information passed from prog-specific *_is_valid_access
> > @@ -283,6 +285,7 @@ enum bpf_reg_type {
> >  struct bpf_insn_access_aux {
> > enum bpf_reg_type reg_type;
> > int ctx_field_size;
> > +   u32 buf_id;
> >  };
> >  
> >  static inline void
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 1305ccbd8fe6..3538382184f3 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -45,7 +45,7 @@ struct bpf_reg_state {
> > /* Ordering of fields matters.  See states_equal() */
> > enum bpf_reg_type type;
> > union {
> > -   /* valid when type == PTR_TO_PACKET */
> > +   /* valid when type == PTR_TO_PACKET | PTR_TO_BUFFER */
> > u16 range;
> >  
> > /* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
> > @@ -132,6 +132,8 @@ struct bpf_reg_state {
> >  */
> > u32 frameno;
> > enum bpf_reg_liveness live;
> > +   /* For PTR_TO_BUFFER, to identify distinct buffers in a context. */
> > +   u32 buf_id;
> >  };
> >  
> >  enum bpf_stack_slot_type {
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index f9e5536fd1af..5fba4e6f5424 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -406,6 +406,8 @@ static const char * const reg_type_str[] = {
> > [PTR_TO_TCP_SOCK]   = "tcp_sock",
> > [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
> > [PTR_TO_TP_BUFFER]  = "tp_buffer",
> > +   [PTR_TO_BUFFER] = "buf",
> > +   [PTR_TO_BUFFER_END] = "buf_end",
> >  };
> >  
> >  static char slot_type_char[] = {
> > @@ -467,6 +469,9 @@ static void print_verifier_state(struct 
> > bpf_verifier_env *env,
> > verbose(env, ",off=%d", reg->off);
> > if (type_is_pkt_pointer(t))
> > verbose(env, ",r=%d", reg->range);
> > +   else if (t == PTR_TO_BUFFER)
> > +   verbose(env, ",r=%d,bid=%d", reg->range,
> > +   reg->buf_id);

Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm

2019-05-20 Thread Vinod Koul

On 07-05-19, 09:16, Robin Gong wrote:
> Because the number of ecspi1 rx event on i.mx8mm is 0, the condition
> check ignore such special case without dma channel enabled, which caused
> ecspi1 rx works failed. Actually, no need to check event_id0, checking
> event_id1 is enough for DEV_2_DEV case because it's so lucky that event_id1
> never be 0.

Well is that by chance or design that event_id1 will be never 0?

> 
> Signed-off-by: Robin Gong 
> ---
>  drivers/dma/imx-sdma.c | 12 +---
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
> index a495c7f..86594fc 100644
> --- a/drivers/dma/imx-sdma.c
> +++ b/drivers/dma/imx-sdma.c
> @@ -1370,8 +1370,8 @@ static void sdma_free_chan_resources(struct dma_chan 
> *chan)
>  
>   sdma_channel_synchronize(chan);
>  
> - if (sdmac->event_id0)
> - sdma_event_disable(sdmac, sdmac->event_id0);
> + sdma_event_disable(sdmac, sdmac->event_id0);
> +
>   if (sdmac->event_id1)
>   sdma_event_disable(sdmac, sdmac->event_id1);
>  
> @@ -1670,11 +1670,9 @@ static int sdma_config(struct dma_chan *chan,
>   memcpy(>slave_config, dmaengine_cfg, sizeof(*dmaengine_cfg));
>  
>   /* Set ENBLn earlier to make sure dma request triggered after that */
> - if (sdmac->event_id0) {
> - if (sdmac->event_id0 >= sdmac->sdma->drvdata->num_events)
> - return -EINVAL;
> - sdma_event_enable(sdmac, sdmac->event_id0);
> - }
> + if (sdmac->event_id0 >= sdmac->sdma->drvdata->num_events)
> + return -EINVAL;
> + sdma_event_enable(sdmac, sdmac->event_id0);
>  
>   if (sdmac->event_id1) {
>   if (sdmac->event_id1 >= sdmac->sdma->drvdata->num_events)
> -- 
> 2.7.4
> 

-- 
~Vinod

Re: [PATCH v3 04/14] dmaengine: imx-sdma: remove dupilicated sdma_load_context

2019-05-20 Thread Vinod Koul

On 07-05-19, 09:16, Robin Gong wrote:
> Since sdma_transfer_init() will do sdma_load_context before any
> sdma transfer, no need once more in sdma_config_channel().

Acked-by: Vinod Koul 

-- 
~Vinod

Re: [PATCH v3 09/14] dmaengine: imx-sdma: remove ERR009165 on i.mx6ul

2019-05-20 Thread Vinod Koul

On 07-05-19, 09:16, Robin Gong wrote:
> ECSPI issue fixed from i.mx6ul at hardware level, no need
> ERR009165 anymore on those chips such as i.mx8mq. Add i.mx6sx
> from where i.mx6ul source.

Acked-by: Vinod Koul 

-- 
~Vinod

Re: [PATCH v3 05/14] dmaengine: imx-sdma: add mcu_2_ecspi script

2019-05-20 Thread Vinod Koul

On 07-05-19, 09:16, Robin Gong wrote:
> Add mcu_2_ecspi script to fix ecspi errata ERR009165.

Acked-by: Vinod Koul 

-- 
~Vinod

Re: [PATCH v2] mm, memory-failure: clarify error message

2019-05-20 Thread Pankaj Gupta



> Some user who install SIGBUS handler that does longjmp out
> therefore keeping the process alive is confused by the error
> message
>   "[188988.765862] Memory failure: 0x1840200: Killing
>cellsrv:33395 due to hardware memory corruption"
> Slightly modify the error message to improve clarity.
> 
> Signed-off-by: Jane Chu 
> ---
>  mm/memory-failure.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index fc8b517..c4f4bcd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -216,7 +216,7 @@ static int kill_proc(struct to_kill *tk, unsigned long
> pfn, int flags)
>  short addr_lsb = tk->size_shift;
>  int ret;
>  
> -pr_err("Memory failure: %#lx: Killing %s:%d due to hardware memory
> corruption\n",
> +pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware
> memory corruption\n",
>  pfn, t->comm, t->pid);
>  
>  if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
> --
> 1.8.3.1

This error message is helpful.

Acked-by: Pankaj Gupta 

> 
> ___
> Linux-nvdimm mailing list
> linux-nvd...@lists.01.org
> https://lists.01.org/mailman/listinfo/linux-nvdimm
>

Re: [PATCH v4 2/3] fdt: add support for rng-seed

2019-05-20 Thread Hsin-Yi Wang

On Mon, May 20, 2019 at 7:54 AM Nicolas Boichat  wrote:

> Alphabetical order.
Original headers are not sorted, should I sort them here?
>

>
> I'm a little bit concerned about this, as we really want the rng-seed
> value to be wiped, and not kept in memory (even if it's hard to
> access).
>
> IIUC, fdt_delprop splices the device tree, so it'll override
> "rng-seed" property with whatever device tree entries follow it.
> However, if rng-seed is the last property (or if the entries that
> follow are smaller than rng-seed), the seed will stay in memory (or
> part of it).
>
> fdt_nop_property in v2 would erase it for sure. I don't know if there
> is a way to make sure that rng-seed is removed for good while still
> deleting the property (maybe modify fdt_splice_ to do a memset(.., 0)
> of the moved chunk?).
>
So maybe we can use fdt_nop_property() back?

[PATCH] arm64/mm: Move PTE_VALID from SW defined to HW page table entry definitions

2019-05-20 Thread Anshuman Khandual

PTE_VALID signifies that the last level page table entry is valid and it is
MMU recognized while walking the page table. This is not a software defined
PTE bit and should not be listed like one. Just move it to appropriate
header file.

Signed-off-by: Anshuman Khandual 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Steve Capper 
Cc: Suzuki Poulose 
Cc: James Morse 
---
 arch/arm64/include/asm/pgtable-hwdef.h | 1 +
 arch/arm64/include/asm/pgtable-prot.h  | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/pgtable-hwdef.h 
b/arch/arm64/include/asm/pgtable-hwdef.h
index a69259c..974f011 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -153,6 +153,7 @@
 /*
  * Level 3 descriptor (PTE).
  */
+#define PTE_VALID  (_AT(pteval_t, 1) << 0)
 #define PTE_TYPE_MASK  (_AT(pteval_t, 3) << 0)
 #define PTE_TYPE_FAULT (_AT(pteval_t, 0) << 0)
 #define PTE_TYPE_PAGE  (_AT(pteval_t, 3) << 0)
diff --git a/arch/arm64/include/asm/pgtable-prot.h 
b/arch/arm64/include/asm/pgtable-prot.h
index 986e41c..38c7148 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -24,7 +24,6 @@
 /*
  * Software defined PTE bits definition.
  */
-#define PTE_VALID  (_AT(pteval_t, 1) << 0)
 #define PTE_WRITE  (PTE_DBM)/* same as DBM (51) */
 #define PTE_DIRTY  (_AT(pteval_t, 1) << 55)
 #define PTE_SPECIAL(_AT(pteval_t, 1) << 56)
-- 
2.7.4

Re: [PATCH v7 01/12] x86/crypto: Adapt assembly for PIE support

2019-05-20 Thread Eric Biggers

On Mon, May 20, 2019 at 04:19:26PM -0700, Thomas Garnier wrote:
> diff --git a/arch/x86/crypto/sha256-avx2-asm.S 
> b/arch/x86/crypto/sha256-avx2-asm.S
> index 1420db15dcdd..2ced4b2f6c76 100644
> --- a/arch/x86/crypto/sha256-avx2-asm.S
> +++ b/arch/x86/crypto/sha256-avx2-asm.S
> @@ -588,37 +588,42 @@ last_block_enter:
>   mov INP, _INP(%rsp)
>  
>   ## schedule 48 input dwords, by doing 3 rounds of 12 each
> - xor SRND, SRND
> + leaqK256(%rip), SRND
> + ## loop1 upper bound
> + leaqK256+3*4*32(%rip), INP
>  
>  .align 16
>  loop1:
> - vpaddd  K256+0*32(SRND), X0, XFER
> + vpaddd  0*32(SRND), X0, XFER
>   vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
>   FOUR_ROUNDS_AND_SCHED   _XFER + 0*32
>  
> - vpaddd  K256+1*32(SRND), X0, XFER
> + vpaddd  1*32(SRND), X0, XFER
>   vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
>   FOUR_ROUNDS_AND_SCHED   _XFER + 1*32
>  
> - vpaddd  K256+2*32(SRND), X0, XFER
> + vpaddd  2*32(SRND), X0, XFER
>   vmovdqa XFER, 2*32+_XFER(%rsp, SRND)
>   FOUR_ROUNDS_AND_SCHED   _XFER + 2*32
>  
> - vpaddd  K256+3*32(SRND), X0, XFER
> + vpaddd  3*32(SRND), X0, XFER
>   vmovdqa XFER, 3*32+_XFER(%rsp, SRND)
>   FOUR_ROUNDS_AND_SCHED   _XFER + 3*32
>  
>   add $4*32, SRND
> - cmp $3*4*32, SRND
> + cmp INP, SRND
>   jb  loop1
>  
> + ## loop2 upper bound
> + leaqK256+4*4*32(%rip), INP
> +
>  loop2:
>   ## Do last 16 rounds with no scheduling
> - vpaddd  K256+0*32(SRND), X0, XFER
> + vpaddd  0*32(SRND), X0, XFER
>   vmovdqa XFER, 0*32+_XFER(%rsp, SRND)
>   DO_4ROUNDS  _XFER + 0*32
>  
> - vpaddd  K256+1*32(SRND), X1, XFER
> + vpaddd  1*32(SRND), X1, XFER
>   vmovdqa XFER, 1*32+_XFER(%rsp, SRND)
>   DO_4ROUNDS  _XFER + 1*32
>   add $2*32, SRND
> @@ -626,7 +631,7 @@ loop2:
>   vmovdqa X2, X0
>   vmovdqa X3, X1
>  
> - cmp $4*4*32, SRND
> + cmp INP, SRND
>   jb  loop2
>  
>   mov _CTX(%rsp), CTX

There is a crash in sha256-avx2-asm.S with this patch applied.  Looks like the
%rsi register is being used for two different things at the same time: 'INP' and
'y3'?  You should be able to reproduce by booting a kernel configured with:

CONFIG_CRYPTO_SHA256_SSSE3=y
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set

Crash report:

BUG: unable to handle page fault for address: c8ff83b21a80
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0 
Oops: 0002 [#1] SMP
CPU: 3 PID: 359 Comm: cryptomgr_test Not tainted 5.2.0-rc1-00109-g9fb4fd100429b 
#5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-20181126_142135-anatol 04/01/2014
RIP: 0010:loop1+0x4/0x888
Code: 83 c6 40 48 89 b4 24 08 02 00 00 48 8d 3d 94 d3 d0 00 48 8d 35 0d d5 d0 
00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 c
RSP: 0018:c90001d43880 EFLAGS: 00010286
RAX: 6a09e667 RBX: bb67ae85 RCX: 3c6ef372
RDX: 510e527f RSI: 81dde380 RDI: 81dde200
RBP: c90001d43b10 R08: a54ff53a R09: 9b05688c
R10: 1f83d9ab R11: 5be0cd19 R12: 
R13: 88807cfd4598 R14: 810d0da0 R15: c90001d43cc0
FS:  () GS:88807fd8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: c8ff83b21a80 CR3: 0200f000 CR4: 003406e0
Call Trace:
 sha256_avx2_finup arch/x86/crypto/sha256_ssse3_glue.c:242 [inline]
 sha256_avx2_final+0x17/0x20 arch/x86/crypto/sha256_ssse3_glue.c:247
 crypto_shash_final+0x13/0x20 crypto/shash.c:166
 shash_async_final+0x11/0x20 crypto/shash.c:265
 crypto_ahash_op+0x24/0x60 crypto/ahash.c:373
 crypto_ahash_final+0x11/0x20 crypto/ahash.c:384
 do_ahash_op.constprop.13+0x10/0x40 crypto/testmgr.c:1049
 test_hash_vec_cfg+0x5b1/0x610 crypto/testmgr.c:1225
 test_hash_vec crypto/testmgr.c:1268 [inline]
 __alg_test_hash.isra.8+0x115/0x1d0 crypto/testmgr.c:1498
 alg_test_hash+0x7b/0x100 crypto/testmgr.c:1546
 alg_test.part.12+0xa4/0x360 crypto/testmgr.c:4931
 alg_test+0x12/0x30 crypto/testmgr.c:4895
 cryptomgr_test+0x26/0x50 crypto/algboss.c:223
 kthread+0x124/0x140 kernel/kthread.c:254
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Modules linked in:
CR2: c8ff83b21a80
---[ end trace ee8ece604888de3e ]---

- Eric

[PATCH] MIPS: remove a space after -I to cope with header search paths for VDSO

2019-05-20 Thread Masahiro Yamada

Commit 9cc342f6c4a0 ("treewide: prefix header search paths with
$(srctree)/") caused a build error for MIPS VDSO.

  CC  arch/mips/vdso/gettimeofday.o
In file included from ../arch/mips/vdso/vdso.h:26,
 from ../arch/mips/vdso/gettimeofday.c:11:
../arch/mips/include/asm/page.h:12:10: fatal error: spaces.h: No such file or 
directory
 #include 
  ^~

The cause of the error is a missing space after the compiler flag -I .

Kbuild used to have a global restriction "no space after -I", but
commit 48f6e3cf5bc6 ("kbuild: do not drop -I without parameter") got
rid of it. Having a space after -I is no longer a big deal as far as
Kbuild is concerned.

It is still a big deal for MIPS because arch/mips/vdso/Makefile
filters the header search paths, like this:

  ccflags-vdso := \
  $(filter -I%,$(KBUILD_CFLAGS)) \

..., which relies on the assumption that there is no space after -I .

Fixes: 9cc342f6c4a0 ("treewide: prefix header search paths with $(srctree)/")
Reported-by: kbuild test robot 
Signed-off-by: Masahiro Yamada 
---

 arch/mips/pnx833x/Platform | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/pnx833x/Platform b/arch/mips/pnx833x/Platform
index 6b1a847d593f..287260669551 100644
--- a/arch/mips/pnx833x/Platform
+++ b/arch/mips/pnx833x/Platform
@@ -1,5 +1,5 @@
 # NXP STB225
 platform-$(CONFIG_SOC_PNX833X) += pnx833x/
-cflags-$(CONFIG_SOC_PNX833X)   += -I 
$(srctree)/arch/mips/include/asm/mach-pnx833x
+cflags-$(CONFIG_SOC_PNX833X)   += 
-I$(srctree)/arch/mips/include/asm/mach-pnx833x
 load-$(CONFIG_NXP_STB220)  += 0x80001000
 load-$(CONFIG_NXP_STB225)  += 0x80001000
-- 
2.17.1

Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c

2019-05-20 Thread Gen Zhang

On Mon, May 20, 2019 at 11:26:20PM -0400, Nicolas Pitre wrote:
> On Tue, 21 May 2019, Gen Zhang wrote:
> 
> > On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote:
> > > On Tue, 21 May 2019, Gen Zhang wrote:
> > > 
> > > > In function con_init(), the pointer variable vc_cons[currcons].d, vc and
> > > > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are
> > > > used in the following codes.
> > > > However, when there is a memory allocation error, kzalloc() can fail.
> > > > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf)
> > > > dereference may happen. And it will cause the kernel to crash. 
> > > > Therefore,
> > > > we should check return value and handle the error.
> > > > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in
> > > > include/uapi/linux/vt.h. So there is no need to unwind the loop.
> > > 
> > > But what if someone changes that define? It won't be obvious that some 
> > > code did rely on it to be defined to 1.
> > I re-examine the source code. MIN_NR_CONSOLES is only defined once and
> > no other changes to it.
> 
> Yes, that is true today.  But if someone changes that in the future, how 
> will that person know that you relied on it to be 1 for not needing to 
> unwind the loop?
> 
> 
> Nicolas
Hi Nicolas,
Thanks for your explaination! And I got your point. And is this way 
proper?

err_vc_screenbuf:
kfree(vc);
for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++)
vc_cons[currcons].d = NULL;
return -ENOMEM;
err_vc:
console_unlock();
return -ENOMEM;

Thanks
Gen

Re: [PATCH v2 3/7] drivers/soc: xdma: Add user interface

2019-05-20 Thread Andrew Jeffery




On Tue, 21 May 2019, at 05:49, Eddie James wrote:
> This commits adds a miscdevice to provide a user interface to the XDMA
> engine. The interface provides the write operation to start DMA
> operations. The DMA parameters are passed as the data to the write call.
> The actual data to transfer is NOT passed through write. Note that both
> directions of DMA operation are accomplished through the write command;
> BMC to host and host to BMC.
> 
> The XDMA engine is restricted to only accessing the reserved memory
> space on the AST2500, typically used by the VGA. For this reason, this
> commit also adds a simple memory manager for this reserved memory space
> which can then be allocated in pages by users calling mmap. The space
> allocated by a client will be the space used in the DMA operation. For
> an "upstream" (BMC to host) operation, the data in the client's area
> will be transferred to the host. For a "downstream" (host to BMC)
> operation, the host data will be placed in the client's memory area.

Did you explore genalloc as a solution for allocating out of the VGA reserved
memory? Wondering if we can avoid implementing a custom allocator (even if
it is simple).

Andrew

> 
> Poll is also provided in order to determine when the DMA operation is
> complete for non-blocking IO.
> 
> Signed-off-by: Eddie James 
> ---
>  drivers/soc/aspeed/aspeed-xdma.c | 301 
> +++
>  1 file changed, 301 insertions(+)
> 
> diff --git a/drivers/soc/aspeed/aspeed-xdma.c 
> b/drivers/soc/aspeed/aspeed-xdma.c
> index 0992d2a..2162ca0 100644
> --- a/drivers/soc/aspeed/aspeed-xdma.c
> +++ b/drivers/soc/aspeed/aspeed-xdma.c
> @@ -118,6 +118,12 @@ struct aspeed_xdma_cmd {
>   u32 resv1;
>  };
>  
> +struct aspeed_xdma_vga_blk {
> + u32 phys;
> + u32 size;
> + struct list_head list;
> +};
> +
>  struct aspeed_xdma_client;
>  
>  struct aspeed_xdma {
> @@ -128,6 +134,8 @@ struct aspeed_xdma {
>  
>   unsigned long flags;
>   unsigned int cmd_idx;
> + struct mutex list_lock;
> + struct mutex start_lock;
>   wait_queue_head_t wait;
>   struct aspeed_xdma_client *current_client;
>  
> @@ -136,6 +144,9 @@ struct aspeed_xdma {
>   dma_addr_t vga_dma;
>   void *cmdq;
>   void *vga_virt;
> + struct list_head vga_blks_free;
> +
> + struct miscdevice misc;
>  };
>  
>  struct aspeed_xdma_client {
> @@ -325,6 +336,260 @@ static irqreturn_t aspeed_xdma_irq(int irq, void *arg)
>   return IRQ_HANDLED;
>  }
>  
> +static u32 aspeed_xdma_alloc_vga_blk(struct aspeed_xdma *ctx, u32 req_size)
> +{
> + u32 phys = 0;
> + u32 size = PAGE_ALIGN(req_size);
> + struct aspeed_xdma_vga_blk *free;
> +
> + mutex_lock(>list_lock);
> +
> + list_for_each_entry(free, >vga_blks_free, list) {
> + if (free->size >= size) {
> + phys = free->phys;
> +
> + if (size == free->size) {
> + dev_dbg(ctx->dev,
> + "Allocd %08x[%08x r(%08x)], del.\n",
> + phys, size, req_size);
> + list_del(>list);
> + kfree(free);
> + } else {
> + free->phys += size;
> + free->size -= size;
> + dev_dbg(ctx->dev, "Allocd %08x[%08x r(%08x)], "
> + "shrunk %08x[%08x].\n", phys, size,
> + req_size, free->phys, free->size);
> + }
> +
> + break;
> + }
> + }
> +
> + mutex_unlock(>list_lock);
> +
> + return phys;
> +}
> +
> +static void aspeed_xdma_free_vga_blk(struct aspeed_xdma *ctx, u32 phys,
> +  u32 req_size)
> +{
> + u32 min_free = UINT_MAX;
> + u32 size = PAGE_ALIGN(req_size);
> + const u32 end = phys + size;
> + struct aspeed_xdma_vga_blk *free;
> +
> + mutex_lock(>list_lock);
> +
> + list_for_each_entry(free, >vga_blks_free, list) {
> + if (end == free->phys) {
> + u32 fend = free->phys + free->size;
> +
> + dev_dbg(ctx->dev,
> + "Freed %08x[%08x r(%08x)], exp %08x[%08x].\n",
> + phys, size, req_size, free->phys, free->size);
> +
> + free->phys = phys;
> + free->size = fend - free->phys;
> +
> + mutex_unlock(>list_lock);
> + return;
> + }
> +
> + if (free->phys < min_free)
> + min_free = free->phys;
> + }
> +
> + free = kzalloc(sizeof(*free), GFP_KERNEL);
> + if (free) {
> + free->phys = phys;
> + free->size = size;
> +
> + dev_dbg(ctx->dev, "Freed %08x[%08x r(%08x)], new.\n", phys,
> +

Re: [PATCH v5 1/3] dt-bindings: i2c: extend existing opencore bindings.

2019-05-20 Thread Sagar Kadam

Hi Rob,


On Mon, May 20, 2019 at 8:07 PM Rob Herring  wrote:
>
> On Mon, May 20, 2019 at 9:12 AM Sagar Shrikant Kadam
>  wrote:
> >
> > Add FU540-C000 specific device tree bindings to already
> > available i2-ocores file. This device is available on
> > HiFive Unleashed Rev A00 board. Move interrupt and interrupt
> > parents under optional property list as these can be optional.
> >
> > The FU540-C000 SoC from sifive, has an Opencore's I2C block
> > reimplementation.
> >
> > The DT compatibility string for this IP is present in HDL and available at.
> > https://github.com/sifive/sifive-blocks/blob/master/src/main/scala/devices/i2c/I2C.scala#L73
> >
> > Signed-off-by: Sagar Shrikant Kadam 
> > ---
> >  Documentation/devicetree/bindings/i2c/i2c-ocores.txt | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt 
> > b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
> > index 17bef9a..b73960e 100644
> > --- a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
> > +++ b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt
> > @@ -2,8 +2,11 @@ Device tree configuration for i2c-ocores
> >
> >  Required properties:
> >  - compatible  : "opencores,i2c-ocores" or "aeroflexgaisler,i2cmst"
> > +"sifive,fu540-c000-i2c" or "sifive,i2c0".
>
> It's not an OR because both are required. Please reformat to 1 valid
> combination per line.
Yes, will rectify it in V6.

> > +   for Opencore based I2C IP block reimplemented in
> > +   FU540-C000 SoC.Please refer 
> > sifive-blocks-ip-versioning.txt
> > +   for additional details.
> >  - reg : bus address start and address range size of device
> > -- interrupts  : interrupt number
> >  - clocks  : handle to the controller clock; see the note below.
> >  Mutually exclusive with opencores,ip-clock-frequency
> >  - opencores,ip-clock-frequency: frequency of the controller clock in Hz;
> > @@ -12,6 +15,8 @@ Required properties:
> >  - #size-cells : should be <0>
> >
> >  Optional properties:
> > +- interrupt-parent: handle to interrupt controller.
>
> Drop this. interrupt-parent is implied.
>
Sure, will exclude it in v6.

> > +- interrupts  : interrupt number.
> >  - clock-frequency : frequency of bus clock in Hz; see the note below.
> >  Defaults to 100 KHz when the property is not specified
> >  - reg-shift   : device register offsets are shifted by this value
> > --
> > 1.9.1
> >
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Thanks,
Sagar

[PATCH] clk: mediatek: Remove MT8183 unused clock

2019-05-20 Thread Erin Lo

Remove MT8183 sspm clock

Signed-off-by: Erin Lo 
---
This clock should only be set in secure world.
---
 drivers/clk/mediatek/clk-mt8183.c | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/drivers/clk/mediatek/clk-mt8183.c 
b/drivers/clk/mediatek/clk-mt8183.c
index 9d8651033ae9..1aa5f4059251 100644
--- a/drivers/clk/mediatek/clk-mt8183.c
+++ b/drivers/clk/mediatek/clk-mt8183.c
@@ -395,14 +395,6 @@ static const char * const atb_parents[] = {
"syspll_d5"
 };
 
-static const char * const sspm_parents[] = {
-   "clk26m",
-   "univpll_d2_d4",
-   "syspll_d2_d2",
-   "univpll_d2_d2",
-   "syspll_d3"
-};
-
 static const char * const dpi0_parents[] = {
"clk26m",
"tvdpll_d2",
@@ -606,9 +598,6 @@ static const struct mtk_mux top_muxes[] = {
MUX_GATE_CLR_SET_UPD(CLK_TOP_MUX_ATB, "atb_sel",
atb_parents, 0xa0,
0xa4, 0xa8, 0, 2, 7, 0x004, 24),
-   MUX_GATE_CLR_SET_UPD(CLK_TOP_MUX_SSPM, "sspm_sel",
-   sspm_parents, 0xa0,
-   0xa4, 0xa8, 8, 3, 15, 0x004, 25),
MUX_GATE_CLR_SET_UPD(CLK_TOP_MUX_DPI0, "dpi0_sel",
dpi0_parents, 0xa0,
0xa4, 0xa8, 16, 4, 23, 0x004, 26),
@@ -947,12 +936,8 @@ static const struct mtk_gate infra_clks[] = {
"fufs_sel", 13),
GATE_INFRA2(CLK_INFRA_MD32_BCLK, "infra_md32_bclk",
"axi_sel", 14),
-   GATE_INFRA2(CLK_INFRA_SSPM, "infra_sspm",
-   "sspm_sel", 15),
GATE_INFRA2(CLK_INFRA_UNIPRO_MBIST, "infra_unipro_mbist",
"axi_sel", 16),
-   GATE_INFRA2(CLK_INFRA_SSPM_BUS_HCLK, "infra_sspm_bus_hclk",
-   "axi_sel", 17),
GATE_INFRA2(CLK_INFRA_I2C5, "infra_i2c5",
"i2c_sel", 18),
GATE_INFRA2(CLK_INFRA_I2C5_ARBITER, "infra_i2c5_arbiter",
@@ -986,10 +971,6 @@ static const struct mtk_gate infra_clks[] = {
"msdc50_0_sel", 1),
GATE_INFRA3(CLK_INFRA_MSDC2_SELF, "infra_msdc2_self",
"msdc50_0_sel", 2),
-   GATE_INFRA3(CLK_INFRA_SSPM_26M_SELF, "infra_sspm_26m_self",
-   "f_f26m_ck", 3),
-   GATE_INFRA3(CLK_INFRA_SSPM_32K_SELF, "infra_sspm_32k_self",
-   "f_f26m_ck", 4),
GATE_INFRA3(CLK_INFRA_UFS_AXI, "infra_ufs_axi",
"axi_sel", 5),
GATE_INFRA3(CLK_INFRA_I2C6, "infra_i2c6",
-- 
2.18.0

Re: [PATCH] kvm: x86: refine kvm_get_arch_capabilities()

2019-05-20 Thread Xiaoyao Li


Ping.

On 4/19/2019 10:16 AM, Xiaoyao Li wrote:

1. Using X86_FEATURE_ARCH_CAPABILITIES to enumerate the existence of
MSR_IA32_ARCH_CAPABILITIES to avoid using rdmsrl_safe().

2. Since kvm_get_arch_capabilities() is only used in this file, making
it static.

Signed-off-by: Xiaoyao Li 
---
  arch/x86/include/asm/kvm_host.h | 1 -
  arch/x86/kvm/x86.c  | 8 
  2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a9d03af34030..d4ae67870764 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1526,7 +1526,6 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long 
ipi_bitmap_low,
unsigned long ipi_bitmap_high, u32 min,
unsigned long icr, int op_64_bit);
  
-u64 kvm_get_arch_capabilities(void);

  void kvm_define_shared_msr(unsigned index, u32 msr);
  int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
  
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c

index a0d1fc80ac5a..ba8e269a8cd2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1205,11 +1205,12 @@ static u32 msr_based_features[] = {
  
  static unsigned int num_msr_based_features;
  
-u64 kvm_get_arch_capabilities(void)

+static u64 kvm_get_arch_capabilities(void)
  {
-   u64 data;
+   u64 data = 0;
  
-	rdmsrl_safe(MSR_IA32_ARCH_CAPABILITIES, );

+   if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+   rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
  
  	/*

 * If we're doing cache flushes (either "always" or "cond")
@@ -1225,7 +1226,6 @@ u64 kvm_get_arch_capabilities(void)
  
  	return data;

  }
-EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
  
  static int kvm_get_msr_feature(struct kvm_msr_entry *msr)

  {

[PATCH] arm64/hugetlb: Use macros for contiguous huge page sizes

2019-05-20 Thread Anshuman Khandual

Replace all open encoded contiguous huge page size computations with
available macro encodings CONT_PTE_SIZE and CONT_PMD_SIZE. There are other
instances where these macros are used in the file and this change makes it
consistently use the same mnemonic.

Signed-off-by: Anshuman Khandual 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Steve Capper 
Cc: Mark Rutland 
---
 arch/arm64/mm/hugetlbpage.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 6b4a47b..05b5dda 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -236,7 +236,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
 
if (sz == PUD_SIZE) {
ptep = (pte_t *)pudp;
-   } else if (sz == (PAGE_SIZE * CONT_PTES)) {
+   } else if (sz == (CONT_PTE_SIZE)) {
pmdp = pmd_alloc(mm, pudp, addr);
 
WARN_ON(addr & (sz - 1));
@@ -254,7 +254,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm,
ptep = huge_pmd_share(mm, addr, pudp);
else
ptep = (pte_t *)pmd_alloc(mm, pudp, addr);
-   } else if (sz == (PMD_SIZE * CONT_PMDS)) {
+   } else if (sz == (CONT_PMD_SIZE)) {
pmdp = pmd_alloc(mm, pudp, addr);
WARN_ON(addr & (sz - 1));
return (pte_t *)pmdp;
@@ -462,9 +462,9 @@ static int __init hugetlbpage_init(void)
 #ifdef CONFIG_ARM64_4K_PAGES
add_huge_page_size(PUD_SIZE);
 #endif
-   add_huge_page_size(PMD_SIZE * CONT_PMDS);
+   add_huge_page_size(CONT_PMD_SIZE);
add_huge_page_size(PMD_SIZE);
-   add_huge_page_size(PAGE_SIZE * CONT_PTES);
+   add_huge_page_size(CONT_PTE_SIZE);
 
return 0;
 }
@@ -478,9 +478,9 @@ static __init int setup_hugepagesz(char *opt)
 #ifdef CONFIG_ARM64_4K_PAGES
case PUD_SIZE:
 #endif
-   case PMD_SIZE * CONT_PMDS:
+   case CONT_PMD_SIZE:
case PMD_SIZE:
-   case PAGE_SIZE * CONT_PTES:
+   case CONT_PTE_SIZE:
add_huge_page_size(ps);
return 1;
}
-- 
2.7.4

Re: [PATCH v7 11/12] soc: mediatek: cmdq: add cmdq_dev_get_client_reg function

2019-05-20 Thread CK Hu

On Tue, 2019-05-21 at 09:11 +0800, Bibby Hsieh wrote:
> GCE cannot know the register base address, this function
> can help cmdq client to get the cmdq_client_reg structure.
> 
> Signed-off-by: Bibby Hsieh 
> ---
>  drivers/soc/mediatek/mtk-cmdq-helper.c | 25 +
>  include/linux/soc/mediatek/mtk-cmdq.h  | 18 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c 
> b/drivers/soc/mediatek/mtk-cmdq-helper.c
> index 70ad4d806fac..815845bb5982 100644
> --- a/drivers/soc/mediatek/mtk-cmdq-helper.c
> +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c
> @@ -27,6 +27,31 @@ struct cmdq_instruction {
>   u8 op;
>  };
>  
> +struct cmdq_client_reg  *cmdq_dev_get_client_reg(struct device *dev, int idx)
> +{
> + struct cmdq_client_reg *client_reg;
> + struct of_phandle_args spec;
> +
> + client_reg  = devm_kzalloc(dev, sizeof(*client_reg), GFP_KERNEL);
> + if (!client_reg)
> + return NULL;
> +
> + if (of_parse_phandle_with_args(dev->of_node, "mediatek,gce-client-reg",
> +"#subsys-cells", idx, )) {
> + dev_err(dev, "can't parse gce-client-reg property (%d)", idx);

I think you should call devm_kfree(client_reg) here because this
function may not be called in client driver's probe function. But in
another view point, I would like you to move the memory allocation out
of this function. When client call cmdq_dev_get_client_reg() to get a
pointer, it's easy that client does not free it because you does not
provide free API, Some client may embed struct cmdq_client_reg with its
client structure together,

struct client {
struct cmdq_client_reg client_reg;
};

Because each client may have different memory allocation strategy, so I
would like you to move memory allocation out of this function to let
client driver have the flexibility.

Regards,
CK

> +
> + return NULL;
> + }
> +
> + client_reg->subsys = spec.args[0];
> + client_reg->offset = spec.args[1];
> + client_reg->size = spec.args[2];
> + of_node_put(spec.np);
> +
> + return client_reg;
> +}
> +EXPORT_SYMBOL(cmdq_dev_get_client_reg);
> +
>  static void cmdq_client_timeout(struct timer_list *t)
>  {
>   struct cmdq_client *client = from_timer(client, t, timer);
> diff --git a/include/linux/soc/mediatek/mtk-cmdq.h 
> b/include/linux/soc/mediatek/mtk-cmdq.h
> index a345870a6d10..d0dea3780f7a 100644
> --- a/include/linux/soc/mediatek/mtk-cmdq.h
> +++ b/include/linux/soc/mediatek/mtk-cmdq.h
> @@ -15,6 +15,12 @@
>  
>  struct cmdq_pkt;
>  
> +struct cmdq_client_reg {
> + u8 subsys;
> + u16 offset;
> + u16 size;
> +};
> +
>  struct cmdq_client {
>   spinlock_t lock;
>   u32 pkt_cnt;
> @@ -142,4 +148,16 @@ int cmdq_pkt_flush_async(struct cmdq_pkt *pkt, 
> cmdq_async_flush_cb cb,
>   */
>  int cmdq_pkt_flush(struct cmdq_pkt *pkt);
>  
> +/**
> + * cmdq_dev_get_client_reg() - parse cmdq client reg from the device node of 
> CMDQ client
> + * @dev: device of CMDQ mailbox client
> + * @idx: the index of desired reg
> + *
> + * Return: CMDQ client reg pointer
> + *
> + * Help CMDQ client pasing the cmdq client reg
> + * from the device node of CMDQ client.
> + */
> +struct cmdq_client_reg  *cmdq_dev_get_client_reg(struct device *dev, int 
> idx);
> +
>  #endif   /* __MTK_CMDQ_H__ */

Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c

2019-05-20 Thread Nicolas Pitre

On Tue, 21 May 2019, Gen Zhang wrote:

> On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote:
> > On Tue, 21 May 2019, Gen Zhang wrote:
> > 
> > > In function con_init(), the pointer variable vc_cons[currcons].d, vc and
> > > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are
> > > used in the following codes.
> > > However, when there is a memory allocation error, kzalloc() can fail.
> > > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf)
> > > dereference may happen. And it will cause the kernel to crash. Therefore,
> > > we should check return value and handle the error.
> > > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in
> > > include/uapi/linux/vt.h. So there is no need to unwind the loop.
> > 
> > But what if someone changes that define? It won't be obvious that some 
> > code did rely on it to be defined to 1.
> I re-examine the source code. MIN_NR_CONSOLES is only defined once and
> no other changes to it.

Yes, that is true today.  But if someone changes that in the future, how 
will that person know that you relied on it to be 1 for not needing to 
unwind the loop?


Nicolas

On how to make your claim,Kindly follow the instructions

2019-05-20 Thread Steven Terner

US Department of the Treasury 
1500 Pennsylvania Avenue, NW 
Washington, DC 20220 

Good day to you. I write to inform you that approval has been given to release 
your long over due payment, in the amount of $2,550,000.00  being payment for 
Contract/Inheritance payment and Scam Victims, which your name appeared on the 
payment list.

On how to make your claim, Kindly follow the instructions below.

Kindly  Contact: 
Ken Phelan
U.S. Department of the Treasury
Office of Financial Research (OFR)
E-mail1:  kenphelanoffic...@163.com
E-mail2: kenphelan...@financier.com

With the details below for more information.
Also indicate your payment File Number: UNC/FSC/01477/PAYTAG.
1) Full Names 
2) Home Address
3) Nationality: 
4) Phone Number: 

Ensure the details are correct and use your payment file (Payment code) as your 
subject for immediate attention.
Good luck to you and congratulations.

Best Regards, 
Steven Terner Mnuchin 
Secretary of the Treasury.

** 
The information in this email is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. If you are not the 
intended recipient, please delete this message from your computer and destroy 
any Copies which have been made.

Re: [PATCH v2 4/7] drivers/soc: xdma: Add PCI device configuration sysfs

2019-05-20 Thread Andrew Jeffery




On Tue, 21 May 2019, at 05:51, Eddie James wrote:
> The AST2500 has two PCI devices embedded. The XDMA engine can use either
> device to perform DMA transfers. Users need the capability to choose
> which device to use. This commit therefore adds two sysfs files that
> toggle the AST2500 and XDMA engine between the two PCI devices.
> 
> Signed-off-by: Eddie James 
> ---
>  drivers/soc/aspeed/aspeed-xdma.c | 64 
> 
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/soc/aspeed/aspeed-xdma.c 
> b/drivers/soc/aspeed/aspeed-xdma.c
> index 2162ca0..002b571 100644
> --- a/drivers/soc/aspeed/aspeed-xdma.c
> +++ b/drivers/soc/aspeed/aspeed-xdma.c
> @@ -667,6 +667,64 @@ static void aspeed_xdma_free_vga_blks(struct 
> aspeed_xdma *ctx)
>   }
>  }
>  
> +static int aspeed_xdma_change_pcie_conf(struct aspeed_xdma *ctx, u32 conf)
> +{
> + int rc;
> +
> + mutex_lock(>start_lock);
> + rc = wait_event_interruptible_timeout(ctx->wait,
> +   !test_bit(XDMA_IN_PRG,
> + >flags),
> +   msecs_to_jiffies(1000));
> + if (rc < 0) {
> + mutex_unlock(>start_lock);
> + return -EINTR;
> + }
> +
> + /* previous op didn't complete, wake up waiters anyway */
> + if (!rc)
> + wake_up_interruptible_all(>wait);
> +
> + reset_control_assert(ctx->reset);
> + msleep(10);
> +
> + aspeed_scu_pcie_write(ctx, conf);
> + msleep(10);
> +
> + reset_control_deassert(ctx->reset);
> + msleep(10);
> +
> + aspeed_xdma_init_eng(ctx);
> +
> + mutex_unlock(>start_lock);
> +
> + return 0;
> +}
> +
> +static ssize_t aspeed_xdma_use_bmc(struct device *dev,
> +struct device_attribute *attr,
> +const char *buf, size_t count)
> +{
> + int rc;
> + struct aspeed_xdma *ctx = dev_get_drvdata(dev);
> +
> + rc = aspeed_xdma_change_pcie_conf(ctx, aspeed_xdma_bmc_pcie_conf);
> + return rc ?: count;
> +}
> +static DEVICE_ATTR(use_bmc, 0200, NULL, aspeed_xdma_use_bmc);
> +
> +static ssize_t aspeed_xdma_use_vga(struct device *dev,
> +struct device_attribute *attr,
> +const char *buf, size_t count)
> +{
> + int rc;
> + struct aspeed_xdma *ctx = dev_get_drvdata(dev);
> +
> + rc = aspeed_xdma_change_pcie_conf(ctx, aspeed_xdma_vga_pcie_conf);
> + return rc ?: count;
> +}
> +static DEVICE_ATTR(use_vga, 0200, NULL, aspeed_xdma_use_vga);
> +
>  static int aspeed_xdma_probe(struct platform_device *pdev)
>  {
>   int irq;
> @@ -745,6 +803,9 @@ static int aspeed_xdma_probe(struct platform_device *pdev)
>   return rc;
>   }
>  
> + device_create_file(dev, _attr_use_bmc);
> + device_create_file(dev, _attr_use_vga);

Two attributes is a broken approach IMO. This gives the false representation of 
4
states (neither, vga, bmc, both) when really there are only two (vga and bmc). I
think we should have one attribute that reacts to "vga" and "bmc" writes.

Andrew

> +
>   return 0;
>  }
>  
> @@ -752,6 +813,9 @@ static int aspeed_xdma_remove(struct platform_device 
> *pdev)
>  {
>   struct aspeed_xdma *ctx = platform_get_drvdata(pdev);
>  
> + device_remove_file(ctx->dev, _attr_use_vga);
> + device_remove_file(ctx->dev, _attr_use_bmc);
> +
>   misc_deregister(>misc);
>  
>   aspeed_xdma_free_vga_blks(ctx);
> -- 
> 1.8.3.1
> 
>

Re: [PATCH v3] vt: Fix a missing-check bug in drivers/tty/vt/vt.c

2019-05-20 Thread Gen Zhang

On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote:
> As soon as you release the lock, another thread could come along and 
> start using the memory pointed by vc_cons[currcons].d you're about to 
> free here. This is unlikely for an initcall, but still.
> 
> You should consider this ordering instead:
> 
> err_vc_screenbuf:
>   kfree(vc);
>   vc_cons[currcons].d = NULL;
> err_vc:
>   console_unlock();
>   return -ENOMEM;
In function con_init(), the pointer variable vc_cons[currcons].d, vc and
vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are
used in the following codes.
However, when there is a memory allocation error, kzalloc() can fail.
Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf)
dereference may happen. And it will cause the kernel to crash. Therefore,
we should check return value and handle the error.
Further,the loop condition MIN_NR_CONSOLES is defined as 1 in
include/uapi/linux/vt.h and it is not changed. So there is no need to
unwind the loop.

Signed-off-by: Gen Zhang 

---
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index fdd12f8..ea47eb3 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -3350,10 +3350,14 @@ static int __init con_init(void)
 
for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), 
GFP_NOWAIT);
+   if (!vc)
+   goto err_vc;
INIT_WORK(_cons[currcons].SAK_work, vc_SAK);
tty_port_init(>port);
visual_init(vc, currcons, 1);
vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_NOWAIT);
+   if (!vc->vc_screenbuf)
+   goto err_vc_screenbuf;
vc_init(vc, vc->vc_rows, vc->vc_cols,
currcons || !vc->vc_sw->con_save_screen);
}
@@ -3375,6 +3379,13 @@ static int __init con_init(void)
register_console(_console_driver);
 #endif
return 0;
+err_vc_screenbuf:
+   kfree(vc);
+   vc_cons[currcons].d = NULL;
+err_vc:
+   console_unlock();
+   return -ENOMEM;
+
 }
 console_initcall(con_init);
---

Re: [v2 PATCH] mm: vmscan: correct nr_reclaimed for THP

2019-05-20 Thread Yang Shi





On 5/20/19 5:43 PM, Yang Shi wrote:



On 5/16/19 11:10 PM, Johannes Weiner wrote:

On Tue, May 14, 2019 at 01:44:35PM -0700, Yang Shi wrote:

On 5/13/19 11:20 PM, Michal Hocko wrote:

On Mon 13-05-19 21:36:59, Yang Shi wrote:
On Mon, May 13, 2019 at 2:45 PM Michal Hocko  
wrote:

On Mon 13-05-19 14:09:59, Yang Shi wrote:
[...]

I think we can just account 512 base pages for nr_scanned for
isolate_lru_pages() to make the counters sane since 
PGSCAN_KSWAPD/DIRECT

just use it.

And, sc->nr_scanned should be accounted as 512 base pages too 
otherwise we
may have nr_scanned < nr_to_reclaim all the time to result in 
false-negative
for priority raise and something else wrong (e.g. wrong 
vmpressure).
Be careful. nr_scanned is used as a pressure indicator to slab 
shrinking

AFAIR. Maybe this is ok but it really begs for much more explaining

I don't know why my company mailbox didn't receive this email, so I
replied with my personal email.

It is not used to double slab pressure any more since commit
9092c71bb724 ("mm: use sc->priority for slab shrink targets"). It 
uses

sc->priority to determine the pressure for slab shrinking now.

So, I think we can just remove that "double slab pressure" code. 
It is

not used actually and looks confusing now. Actually, the "double slab
pressure" does something opposite. The extra inc to sc->nr_scanned
just prevents from raising sc->priority.

I have to get in sync with the recent changes. I am aware there were
some patches floating around but I didn't get to review them. I was
trying to point out that nr_scanned used to have a side effect to be
careful about. If it doesn't have anymore then this is getting much 
more

easier of course. Please document everything in the changelog.

Thanks for reminding. Yes, I remembered nr_scanned would double slab
pressure. But, when I inspected into the code yesterday, it turns 
out it is
not true anymore. I will run some test to make sure it doesn't 
introduce

regression.

Yeah, sc->nr_scanned is used for three things right now:

1. vmpressure - this looks at the scanned/reclaimed ratio so it won't
change semantics as long as scanned & reclaimed are fixed in parallel

2. compaction/reclaim - this is broken. Compaction wants a certain
number of physical pages freed up before going back to compacting.
Without Yang Shi's fix, we can overreclaim by a factor of 512.

3. kswapd priority raising - this is broken. kswapd raises priority if
we scan fewer pages than the reclaim target (which itself is obviously
expressed in order-0 pages). As a result, kswapd can falsely raise its
aggressiveness even when it's making great progress.

Both sc->nr_scanned & sc->nr_reclaimed should be fixed.


Yes, v3 patch (sit in my local repo now) did fix both.



BTW, I noticed the counter of memory reclaim is not correct with THP 
swap on

vanilla kernel, please see the below:

pgsteal_kswapd 21435
pgsteal_direct 26573329
pgscan_kswapd 3514
pgscan_direct 14417775

pgsteal is always greater than pgscan, my patch could fix the problem.

Ouch, how is that possible with the current code?

I think it happens when isolate_lru_pages() counts 1 nr_scanned for a
THP, then shrink_page_list() splits the THP and we reclaim tail pages
one by one. This goes all the way back to the initial THP patch!


I think so. It does make sense. But, the weird thing is I just see 
this with synchronous swap device (some THPs got swapped out in a 
whole, some got split), but I've never seen this with rotate swap 
device (all THPs got split).


I haven't figured out why.



isolate_lru_pages() needs to be fixed. Its return value, nr_taken, is
correct, but its *nr_scanned parameter is wrong, which causes issues:

1. The trace point, as Yang Shi pointed out, will underreport the
number of pages scanned, as it reports it along with nr_to_scan (base
pages) and nr_taken (base pages)

2. vmstat and memory.stat count 'struct page' operations rather than
base pages, which makes zero sense to neither user nor kernel
developers (I routinely multiply these counters by 4096 to get a sense
of work performed).

All of isolate_lru_pages()'s accounting should be in base pages, which
includes nr_scanned and PGSCAN_SKIPPED.

That should also simplify the code; e.g.:

for (total_scan = 0;
 scan < nr_to_scan && nr_taken < nr_to_scan && !list_empty(src);
 total_scan++) {

scan < nr_to_scan && nr_taken >= nr_to_scan is a weird condition that
does not make sense in page reclaim imo. Reclaim cares about physical
memory - freeing one THP is as much progress for reclaim as freeing
512 order-0 pages.


Yes, I do agree. The v3 patch did this.



IMO *all* '++' in vmscan.c are suspicious and should be reviewed:
nr_scanned, nr_reclaimed, nr_dirty, nr_unqueued_dirty, nr_congested,
nr_immediate, nr_writeback, nr_ref_keep, nr_unmap_fail, pgactivate,
total_scan & scan, nr_skipped.


Some of them should be fine but I'm not sure the side effect. IMHO, 
let's fix the most obvious problem first.


A

Re: [RFC PATCH v5 16/16] dcache: Add CONFIG_DCACHE_SMO

2019-05-20 Thread Tobin C. Harding

On Tue, May 21, 2019 at 02:05:38AM +, Roman Gushchin wrote:
> On Tue, May 21, 2019 at 11:31:18AM +1000, Tobin C. Harding wrote:
> > On Tue, May 21, 2019 at 12:57:47AM +, Roman Gushchin wrote:
> > > On Mon, May 20, 2019 at 03:40:17PM +1000, Tobin C. Harding wrote:
> > > > In an attempt to make the SMO patchset as non-invasive as possible add a
> > > > config option CONFIG_DCACHE_SMO (under "Memory Management options") for
> > > > enabling SMO for the DCACHE.  Whithout this option dcache constructor is
> > > > used but no other code is built in, with this option enabled slab
> > > > mobility is enabled and the isolate/migrate functions are built in.
> > > > 
> > > > Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via
> > > > Slab Movable Objects infrastructure.
> > > 
> > > Hm, isn't it better to make it a static branch? Or basically anything
> > > that allows switching on the fly?
> > 
> > If that is wanted, turning SMO on and off per cache, we can probably do
> > this in the SMO code in SLUB.
> 
> Not necessarily per cache, but without recompiling the kernel.
> > 
> > > It seems that the cost of just building it in shouldn't be that high.
> > > And the question if the defragmentation worth the trouble is so much
> > > easier to answer if it's possible to turn it on and off without rebooting.
> > 
> > If the question is 'is defragmentation worth the trouble for the
> > dcache', I'm not sure having SMO turned off helps answer that question.
> > If one doesn't shrink the dentry cache there should be very little
> > overhead in having SMO enabled.  So if one wants to explore this
> > question then they can turn on the config option.  Please correct me if
> > I'm wrong.
> 
> The problem with a config option is that it's hard to switch over.
> 
> So just to test your changes in production a new kernel should be built,
> tested and rolled out to a representative set of machines (which can be
> measured in thousands of machines). Then if results are questionable,
> it should be rolled back.
> 
> What you're actually guarding is the kmem_cache_setup_mobility() call,
> which can be perfectly avoided using a boot option, for example. Turning
> it on and off completely dynamic isn't that hard too.
> 
> Of course, it's up to you, it's just probably easier to find new users
> of a new feature, when it's easy to test it.

Ok, cool - I like it.  Will add for next version.

thanks,
Tobin.

Re: [PATCH v7 03/12] x86: Add macro to get symbol address for PIE support

2019-05-20 Thread hpa

On May 20, 2019 4:19:28 PM PDT, Thomas Garnier  wrote:
>From: Thomas Garnier 
>
>Add a new _ASM_MOVABS macro to fetch a symbol address. It will be used
>to replace "_ASM_MOV $, %dst" code construct that are not
>compatible with PIE.
>
>Signed-off-by: Thomas Garnier 
>---
> arch/x86/include/asm/asm.h | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>index 3ff577c0b102..3a686057e882 100644
>--- a/arch/x86/include/asm/asm.h
>+++ b/arch/x86/include/asm/asm.h
>@@ -30,6 +30,7 @@
> #define _ASM_ALIGN__ASM_SEL(.balign 4, .balign 8)
> 
> #define _ASM_MOV  __ASM_SIZE(mov)
>+#define _ASM_MOVABS   __ASM_SEL(movl, movabsq)
> #define _ASM_INC  __ASM_SIZE(inc)
> #define _ASM_DEC  __ASM_SIZE(dec)
> #define _ASM_ADD  __ASM_SIZE(add)

This is just about *always* wrong on x86-86. We should be using leaq 
sym(%rip),%reg. If it isn't reachable by leaq, then it is a non-PIE symbol like 
percpu. You do have to keep those distinct!
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c

2019-05-20 Thread Gen Zhang

On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote:
> On Tue, 21 May 2019, Gen Zhang wrote:
> 
> > In function con_init(), the pointer variable vc_cons[currcons].d, vc and
> > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are
> > used in the following codes.
> > However, when there is a memory allocation error, kzalloc() can fail.
> > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf)
> > dereference may happen. And it will cause the kernel to crash. Therefore,
> > we should check return value and handle the error.
> > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in
> > include/uapi/linux/vt.h. So there is no need to unwind the loop.
> 
> But what if someone changes that define? It won't be obvious that some 
> code did rely on it to be defined to 1.
I re-examine the source code. MIN_NR_CONSOLES is only defined once and
no other changes to it.

> 
> > Signed-off-by: Gen Zhang 
> > 
> > ---
> > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
> > index fdd12f8..b756609 100644
> > --- a/drivers/tty/vt/vt.c
> > +++ b/drivers/tty/vt/vt.c
> > @@ -3350,10 +3350,14 @@ static int __init con_init(void)
> >  
> > for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
> > vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), 
> > GFP_NOWAIT);
> > +   if (!vc_cons[currcons].d || !vc)
> 
> Both vc_cons[currcons].d and vc are assigned the same value on the 
> previous line. You don't have to test them both.
Thanks for this comment!

> 
> > +   goto err_vc;
> > INIT_WORK(_cons[currcons].SAK_work, vc_SAK);
> > tty_port_init(>port);
> > visual_init(vc, currcons, 1);
> > vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_NOWAIT);
> > +   if (!vc->vc_screenbuf)
> > +   goto err_vc_screenbuf;
> > vc_init(vc, vc->vc_rows, vc->vc_cols,
> > currcons || !vc->vc_sw->con_save_screen);
> > }
> > @@ -3375,6 +3379,14 @@ static int __init con_init(void)
> > register_console(_console_driver);
> >  #endif
> > return 0;
> > +err_vc:
> > +   console_unlock();
> > +   return -ENOMEM;
> > +err_vc_screenbuf:
> > +   console_unlock();
> > +   kfree(vc);
> > +   vc_cons[currcons].d = NULL;
> > +   return -ENOMEM;
> 
> As soon as you release the lock, another thread could come along and 
> start using the memory pointed by vc_cons[currcons].d you're about to 
> free here. This is unlikely for an initcall, but still.
> 
> You should consider this ordering instead:
> 
> err_vc_screenbuf:
>   kfree(vc);
>   vc_cons[currcons].d = NULL;
> err_vc:
>   console_unlock();
>   return -ENOMEM;
> 
> 
Thanks for your patient reply, Nicolas!
I will work on this patch and resubmit it.
Thanks
Gen
> >  }
> >  console_initcall(con_init);
> >  
> >  ---
> >

Re: [PATCH 1/8] net: qualcomm: rmnet: fix struct rmnet_map_header

2019-05-20 Thread Bjorn Andersson

On Mon 20 May 19:30 PDT 2019, Alex Elder wrote:

> On 5/20/19 8:32 PM, Subash Abhinov Kasiviswanathan wrote:
> >>
> >> If you are telling me that the command/data flag resides at bit
> >> 7 of the first byte, I will update the field masks in a later
> >> patch in this series to reflect that.
> >>
> > 
> > Higher order bit is Command / Data.
> 
> So what this means is that to get the command/data bit we use:
> 
>   first_byte & 0x80
> 
> If that is correct I will remove this patch from the series and
> will update the subsequent patches so bit 7 is the command bit,
> bit 6 is reserved, and bits 0-5 are the pad length.
> 
> I will post a v2 of the series with these changes, and will
> incorporate Bjorn's "Reviewed-by".
> 

But didn't you say that your testing show that the current bit order is
wrong?

I still like the cleanup, if nothing else just to clarify and clearly
document the actual content of this header.

Regards,
Bjorn

Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c

2019-05-20 Thread Nicolas Pitre

On Tue, 21 May 2019, Gen Zhang wrote:

> In function con_init(), the pointer variable vc_cons[currcons].d, vc and
> vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are
> used in the following codes.
> However, when there is a memory allocation error, kzalloc() can fail.
> Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf)
> dereference may happen. And it will cause the kernel to crash. Therefore,
> we should check return value and handle the error.
> Further,the loop condition MIN_NR_CONSOLES is defined as 1 in
> include/uapi/linux/vt.h. So there is no need to unwind the loop.

But what if someone changes that define? It won't be obvious that some 
code did rely on it to be defined to 1.

> Signed-off-by: Gen Zhang 
> 
> ---
> diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
> index fdd12f8..b756609 100644
> --- a/drivers/tty/vt/vt.c
> +++ b/drivers/tty/vt/vt.c
> @@ -3350,10 +3350,14 @@ static int __init con_init(void)
>  
>   for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) {
>   vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), 
> GFP_NOWAIT);
> + if (!vc_cons[currcons].d || !vc)

Both vc_cons[currcons].d and vc are assigned the same value on the 
previous line. You don't have to test them both.

> + goto err_vc;
>   INIT_WORK(_cons[currcons].SAK_work, vc_SAK);
>   tty_port_init(>port);
>   visual_init(vc, currcons, 1);
>   vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_NOWAIT);
> + if (!vc->vc_screenbuf)
> + goto err_vc_screenbuf;
>   vc_init(vc, vc->vc_rows, vc->vc_cols,
>   currcons || !vc->vc_sw->con_save_screen);
>   }
> @@ -3375,6 +3379,14 @@ static int __init con_init(void)
>   register_console(_console_driver);
>  #endif
>   return 0;
> +err_vc:
> + console_unlock();
> + return -ENOMEM;
> +err_vc_screenbuf:
> + console_unlock();
> + kfree(vc);
> + vc_cons[currcons].d = NULL;
> + return -ENOMEM;

As soon as you release the lock, another thread could come along and 
start using the memory pointed by vc_cons[currcons].d you're about to 
free here. This is unlikely for an initcall, but still.

You should consider this ordering instead:

err_vc_screenbuf:
kfree(vc);
vc_cons[currcons].d = NULL;
err_vc:
console_unlock();
return -ENOMEM;


>  }
>  console_initcall(con_init);
>  
>  ---
>

[PATCH] perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel

2019-05-20 Thread Vitaly Chikunov

When a host system has kernel headers that are newer than a compiling
kernel, mksyscalltbl fails with errors such as:

  : In function 'main':
  :271:44: error: '__NR_kexec_file_load' undeclared (first use in this 
function)
  :271:44: note: each undeclared identifier is reported only once for 
each function it appears in
  :272:46: error: '__NR_pidfd_send_signal' undeclared (first use in this 
function)
  :273:43: error: '__NR_io_uring_setup' undeclared (first use in this 
function)
  :274:43: error: '__NR_io_uring_enter' undeclared (first use in this 
function)
  :275:46: error: '__NR_io_uring_register' undeclared (first use in this 
function)
  tools/perf/arch/arm64/entry/syscalls//mksyscalltbl: line 48: 
/tmp/create-table-xvUQdD: Permission denied

mksyscalltbl is compiled with default host includes, but run with
compiling kernel tree includes, causing some syscall numbers being
undeclared.

Signed-off-by: Vitaly Chikunov 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Hendrik Brueckner 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kim Phillips 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
---
 tools/perf/arch/arm64/entry/syscalls/mksyscalltbl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl 
b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
index c88fd32563eb..459469b7222c 100755
--- a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
+++ b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl
@@ -56,7 +56,7 @@ create_table()
echo "};"
 }
 
-$gcc -E -dM -x c  $input  \
+$gcc -E -dM -x c -I $incpath/include/uapi $input \
|sed -ne 's/^#define __NR_//p' \
|sort -t' ' -k2 -nu\
|create_table
-- 
2.11.0

Re: [PATCH v2] efi_64: Fix a missing-check bug in arch/x86/platform/efi/efi_64.c

2019-05-20 Thread Gen Zhang

On Fri, May 17, 2019 at 11:24:27AM +0200, Ard Biesheuvel wrote:
> On Fri, 17 May 2019 at 11:06, Gen Zhang  wrote:
> >
> > On Fri, May 17, 2019 at 10:41:28AM +0200, Ard Biesheuvel wrote:
> > > Returning an error here is not going to make much difference, given
> > > that the caller of efi_call_phys_prolog() does not bother to check it,
> > > and passes the result straight into efi_call_phys_epilog(), which
> > > happily attempts to dereference it.
> > >
> > > So if you want to fix this properly, please fix it at the call site as
> > > well. I'd prefer to avoid ERR_PTR() and just return NULL for a failed
> > > allocation though.
> > Hi Ard,
> > Thanks for your timely reply!
> > I think returning NULL in efi_call_phys_prolog() and checking in
> > efi_call_phys_epilog() is much better. But I am confused what to return
> > in efi_call_phys_epilog() if save_pgd is NULL. Definitely not return
> > -ENOMEM, because efi_call_phys_epilog() returns unsigned long. Could
> > please light on me to fix this problem?
> 
> 
> If efi_call_phys_prolog() returns NULL, the calling function should
> abort and never call efi_call_phys_epilog().
In efi_call_phys_prolog(), save_pgd is allocated by kmalloc_array(). 
And it is dereferenced in the following codes. However, memory 
allocation functions such as kmalloc_array() may fail. Dereferencing
this save_pgd null pointer may cause the kernel go wrong. Thus we 
should check this allocation.
Further, if efi_call_phys_prolog() returns NULL, we should abort the 
process in phys_efi_set_virtual_address_map(), and return EFI_ABORTED.

Signed-off-by: Gen Zhang 

---
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index e1cb01a..a7189a3 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -85,6 +85,8 @@ static efi_status_t __init phys_efi_set_virtual_address_map(
pgd_t *save_pgd;
 
save_pgd = efi_call_phys_prolog();
+   if (!save_pgd)
+   return EFI_ABORTED;
 
/* Disable interrupts around EFI calls: */
local_irq_save(flags);
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index cf0347f..828460a 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -91,6 +91,8 @@ pgd_t * __init efi_call_phys_prolog(void)
 
n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
save_pgd = kmalloc_array(n_pgds, sizeof(*save_pgd), GFP_KERNEL);
+   if (!save_pgd)
+   return NULL;
 
/*
 * Build 1:1 identity mapping for efi=old_map usage. Note that
---

Re: [RFC 0/7] introduce memory hinting API for external process

2019-05-20 Thread Minchan Kim

On Mon, May 20, 2019 at 04:42:00PM +0200, Oleksandr Natalenko wrote:
> Hi.
> 
> On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote:
> > - Background
> > 
> > The Android terminology used for forking a new process and starting an app
> > from scratch is a cold start, while resuming an existing app is a hot start.
> > While we continually try to improve the performance of cold starts, hot
> > starts will always be significantly less power hungry as well as faster so
> > we are trying to make hot start more likely than cold start.
> > 
> > To increase hot start, Android userspace manages the order that apps should
> > be killed in a process called ActivityManagerService. ActivityManagerService
> > tracks every Android app or service that the user could be interacting with
> > at any time and translates that into a ranked list for lmkd(low memory
> > killer daemon). They are likely to be killed by lmkd if the system has to
> > reclaim memory. In that sense they are similar to entries in any other 
> > cache.
> > Those apps are kept alive for opportunistic performance improvements but
> > those performance improvements will vary based on the memory requirements of
> > individual workloads.
> > 
> > - Problem
> > 
> > Naturally, cached apps were dominant consumers of memory on the system.
> > However, they were not significant consumers of swap even though they are
> > good candidate for swap. Under investigation, swapping out only begins
> > once the low zone watermark is hit and kswapd wakes up, but the overall
> > allocation rate in the system might trip lmkd thresholds and cause a cached
> > process to be killed(we measured performance swapping out vs. zapping the
> > memory by killing a process. Unsurprisingly, zapping is 10x times faster
> > even though we use zram which is much faster than real storage) so kill
> > from lmkd will often satisfy the high zone watermark, resulting in very
> > few pages actually being moved to swap.
> > 
> > - Approach
> > 
> > The approach we chose was to use a new interface to allow userspace to
> > proactively reclaim entire processes by leveraging platform information.
> > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
> > that are known to be cold from userspace and to avoid races with lmkd
> > by reclaiming apps as soon as they entered the cached state. Additionally,
> > it could provide many chances for platform to use much information to
> > optimize memory efficiency.
> > 
> > IMHO we should spell it out that this patchset complements MADV_WONTNEED
> > and MADV_FREE by adding non-destructive ways to gain some free memory
> > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the
> > kernel that memory region is not currently needed and should be reclaimed
> > when memory pressure rises.
> > 
> > To achieve the goal, the patchset introduce two new options for madvise.
> > One is MADV_COOL which will deactive activated pages and the other is
> > MADV_COLD which will reclaim private pages instantly. These new options
> > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to
> > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way
> > that it hints the kernel that memory region is not currently needed and
> > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way
> > that it hints the kernel that memory region is not currently needed and
> > should be reclaimed when memory pressure rises.
> > 
> > This approach is similar in spirit to madvise(MADV_WONTNEED), but the
> > information required to make the reclaim decision is not known to the app.
> > Instead, it is known to a centralized userspace daemon, and that daemon
> > must be able to initiate reclaim on its own without any app involvement.
> > To solve the concern, this patch introduces new syscall -
> > 
> > struct pr_madvise_param {
> > int size;
> > const struct iovec *vec;
> > }
> > 
> > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior,
> > struct pr_madvise_param *restuls,
> > struct pr_madvise_param *ranges,
> > unsigned long flags);
> > 
> > The syscall get pidfd to give hints to external process and provides
> > pair of result/ranges vector arguments so that it could give several
> > hints to each address range all at once.
> > 
> > I guess others have different ideas about the naming of syscall and options
> > so feel free to suggest better naming.
> > 
> > - Experiment
> > 
> > We did bunch of testing with several hundreds of real users, not artificial
> > benchmark on android. We saw about 17% cold start decreasement without any
> > significant battery/app startup latency issues. And with artificial 
> >

Re: [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER

2019-05-20 Thread Minchan Kim

On Mon, May 20, 2019 at 11:28:01AM +0200, Michal Hocko wrote:
> [cc linux-api]
> 
> On Mon 20-05-19 12:52:54, Minchan Kim wrote:
> > System could have much faster swap device like zRAM. In that case, swapping
> > is extremely cheaper than file-IO on the low-end storage.
> > In this configuration, userspace could handle different strategy for each
> > kinds of vma. IOW, they want to reclaim anonymous pages by MADV_COLD
> > while it keeps file-backed pages in inactive LRU by MADV_COOL because
> > file IO is more expensive in this case so want to keep them in memory
> > until memory pressure happens.
> > 
> > To support such strategy easier, this patch introduces
> > MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER options in madvise(2) like
> > that /proc//clear_refs already has supported same filters.
> > They are filters could be Ored with other existing hints using top two bits
> > of (int behavior).
> 
> madvise operates on top of ranges and it is quite trivial to do the
> filtering from the userspace so why do we need any additional filtering?
> 
> > Once either of them is set, the hint could affect only the interested vma
> > either anonymous or file-backed.
> > 
> > With that, user could call a process_madvise syscall simply with a entire
> > range(0x0 - 0x) but either of MADV_ANONYMOUS_FILTER and
> > MADV_FILE_FILTER so there is no need to call the syscall range by range.
> 
> OK, so here is the reason you want that. The immediate question is why
> cannot the monitor do the filtering from the userspace. Slightly more
> work, all right, but less of an API to expose and that itself is a
> strong argument against.

What I should do if we don't have such filter option is to enumerate all of
vma via /proc//maps and then parse every ranges and inode from string,
which would be painful for 2000+ vmas.

> 
> > * from v1r2
> >   * use consistent check with clear_refs to identify anon/file vma - surenb
> > 
> > * from v1r1
> >   * use naming "filter" for new madvise option - dancol
> > 
> > Signed-off-by: Minchan Kim 
> > ---
> >  include/uapi/asm-generic/mman-common.h |  5 +
> >  mm/madvise.c   | 14 ++
> >  2 files changed, 19 insertions(+)
> > 
> > diff --git a/include/uapi/asm-generic/mman-common.h 
> > b/include/uapi/asm-generic/mman-common.h
> > index b8e230de84a6..be59a1b90284 100644
> > --- a/include/uapi/asm-generic/mman-common.h
> > +++ b/include/uapi/asm-generic/mman-common.h
> > @@ -66,6 +66,11 @@
> >  #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */
> >  #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */
> >  
> > +#define MADV_BEHAVIOR_MASK (~(MADV_ANONYMOUS_FILTER|MADV_FILE_FILTER))
> > +
> > +#define MADV_ANONYMOUS_FILTER  (1<<31) /* works for only anonymous vma 
> > */
> > +#define MADV_FILE_FILTER   (1<<30) /* works for only file-backed vma */
> > +
> >  /* compatibility flags */
> >  #define MAP_FILE   0
> >  
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index f4f569dac2bd..116131243540 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -1002,7 +1002,15 @@ static int madvise_core(struct task_struct *tsk, 
> > unsigned long start,
> > int write;
> > size_t len;
> > struct blk_plug plug;
> > +   bool anon_only, file_only;
> >  
> > +   anon_only = behavior & MADV_ANONYMOUS_FILTER;
> > +   file_only = behavior & MADV_FILE_FILTER;
> > +
> > +   if (anon_only && file_only)
> > +   return error;
> > +
> > +   behavior = behavior & MADV_BEHAVIOR_MASK;
> > if (!madvise_behavior_valid(behavior))
> > return error;
> >  
> > @@ -1067,12 +1075,18 @@ static int madvise_core(struct task_struct *tsk, 
> > unsigned long start,
> > if (end < tmp)
> > tmp = end;
> >  
> > +   if (anon_only && vma->vm_file)
> > +   goto next;
> > +   if (file_only && !vma->vm_file)
> > +   goto next;
> > +
> > /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
> > error = madvise_vma(tsk, vma, , start, tmp,
> > behavior, );
> > if (error)
> > goto out;
> > *nr_pages += pages;
> > +next:
> > start = tmp;
> > if (prev && start < prev->vm_end)
> > start = prev->vm_end;
> > -- 
> > 2.21.0.1020.gf2820cf01a-goog
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1647 matches

Mail list logo