Re: [PATCH RESEND] kvm: make kvm_vcpu_(un)map dependency on CONFIG_HAS_IOMEM explicit
On Mon, May 20, 2019 at 07:23:43PM +0200, Paolo Bonzini wrote: > On 20/05/19 18:44, Michal Kubecek wrote: > > Recently introduced functions kvm_vcpu_map() and kvm_vcpu_unmap() call > > memremap() and memunmap() which are only available if HAS_IOMEM is enabled > > but this dependency is not explicit, so that the build fails with HAS_IOMEM > > disabled. > > > > As both function are only used on x86 where HAS_IOMEM is always enabled, > > the easiest fix seems to be to only provide them when HAS_IOMEM is enabled. > > > > Fixes: e45adf665a53 ("KVM: Introduce a new guest mapping API") > > Signed-off-by: Michal Kubecek > > --- > > Thank you very much. However, it's better if only the memremap part is > hidden behind CONFIG_HAS_IOMEM. I'll send a patch tomorrow and have it > reach Linus at most on Wednesday. That sounds like a better solution. As I'm not familiar with the code, I didn't want to risk and suggested the easiest way around. Michal > There is actually nothing specific to CONFIG_HAS_IOMEM in them, > basically the functionality we want is remap_pfn_range but without a > VMA. However, it's for a niche use case where KVM guest memory is > mmap-ed from /dev/mem and it's okay if for now that part remains > disabled on s390. > > Paolo
Re: [PATCH 2/4] md: raid0: Remove return statement from void function
On Mon, May 20, 2019 at 2:45 PM Marcos Paulo de Souza wrote: > > This return statement was introduced in commit > 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 ("Linux-2.6.12-rc2") and can be > safely removed. Wow, that's a really old commit. :) I think 3/4 and 4/4 of the set makes git-blame more difficult to follow. Let's not apply them. Thanks, Song > > Signed-off-by: Marcos Paulo de Souza > --- > drivers/md/raid0.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c > index f3fb5bb8c82a..42b0287104bd 100644 > --- a/drivers/md/raid0.c > +++ b/drivers/md/raid0.c > @@ -609,7 +609,6 @@ static bool raid0_make_request(struct mddev *mddev, > struct bio *bio) > static void raid0_status(struct seq_file *seq, struct mddev *mddev) > { > seq_printf(seq, " %dk chunks", mddev->chunk_sectors / 2); > - return; > } > > static void *raid0_takeover_raid45(struct mddev *mddev) > -- > 2.21.0 >
Re: [PATCH RESEND] kvm: make kvm_vcpu_(un)map dependency on CONFIG_HAS_IOMEM explicit
On Mon, May 20, 2019 at 03:45:29PM -0700, Bjorn Andersson wrote: > On Mon, May 20, 2019 at 9:44 AM Michal Kubecek wrote: > > > > Recently introduced functions kvm_vcpu_map() and kvm_vcpu_unmap() call > > memremap() and memunmap() which are only available if HAS_IOMEM is enabled > > but this dependency is not explicit, so that the build fails with HAS_IOMEM > > disabled. > > > > As both function are only used on x86 where HAS_IOMEM is always enabled, > > the easiest fix seems to be to only provide them when HAS_IOMEM is enabled. > > > > Fixes: e45adf665a53 ("KVM: Introduce a new guest mapping API") > > Signed-off-by: Michal Kubecek > > Hi Michal, > > I see the same build issue on arm64 and as CONFIG_HAS_IOMEM is set > there this patch has no effect on solving that. Instead I had to > include linux/io.h in kvm_main.c to make it compile. This sounds like a different problem which was already resolved in mainline by commit c011d23ba046 ("kvm: fix compilation on aarch64") which is present in v5.2-rc1. The issue I'm trying to address is link time failure (unresolved reference to memremap()/memunmap()) when CONFIG_HAS_IOMEM is disabled (in our case it affects a special minimalistic s390x config for zfcpdump). Michal
Re: linux-next: Tree for May 21
Hi Stephen, Andrew, On Tue, May 21, 2019 at 2:15 PM Stephen Rothwell wrote: > > Hi all, FYI. Commit 15e57a12d4df3c662f6cceaec6d1efa98a3d70f8 is equivalent to commit ecebc5ce59a003163eb608ace38a01d7ffeb0a95 which is already in the mainline. The former should be dropped, shouldn't it? Thanks. > > Changes since 20190520: > > New trees: soc-fsl, soc-fsl-fixes > > Removed trees: (not updated for more than a year) > alpine, samsung, sh, befs, kconfig, dwmw2-iommu, trivial, > target-updates, target-bva, init_task > > The imx-mxs tree gained a build failure so I used the version from > next-20190520. > > The sunxi tree gained a conflict against the imx-mxs tree. > > The drm-misc tree gained conflicts against Linis' and the amdgpu trees. > > Non-merge commits (relative to Linus' tree): 991 > 998 files changed, 29912 insertions(+), 14691 deletions(-) > > > > I have created today's linux-next tree at > git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git > (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you > are tracking the linux-next tree using git, you should not use "git pull" > to do so as that will try to merge the new linux-next release with the > old one. You should use "git fetch" and checkout or reset to the new > master. > > You can see which trees have been included by looking in the Next/Trees > file in the source. There are also quilt-import.log and merge.log > files in the Next directory. Between each merge, the tree was built > with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a > multi_v7_defconfig for arm and a native build of tools/perf. After > the final fixups (if any), I do an x86_64 modules_install followed by > builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), > ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc > and sparc64 defconfig. And finally, a simple boot test of the powerpc > pseries_le_defconfig kernel in qemu (with and without kvm enabled). > > Below is a summary of the state of the merge. > > I am currently merging 290 trees (counting Linus' and 70 trees of bug > fix patches pending for the current merge release). > > Stats about the size of the tree over time can be seen at > http://neuling.org/linux-next-size.html . > > Status of my local build tests will be at > http://kisskb.ellerman.id.au/linux-next . If maintainers want to give > advice about cross compilers/configs that work, we are always open to add > more builds. > > Thanks to Randy Dunlap for doing many randconfig builds. And to Paul > Gortmaker for triage and bug fixes. > > -- > Cheers, > Stephen Rothwell > > $ git checkout master > $ git reset --hard stable > Merging origin/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux) > Merging fixes/master (2bbacd1a9278 Merge tag 'kconfig-v5.2' of > git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) > Merging kspp-gustavo/for-next/kspp (b324f1b28dc0 afs: yfsclient: Mark > expected switch fall-throughs) > Merging kbuild-current/fixes (a2d635decbfa Merge tag 'drm-next-2019-05-09' of > git://anongit.freedesktop.org/drm/drm) > Merging arc-current/for-curr (c5a1726d7383 ARC: entry: EV_Trap expects r10 > (vs. r9) to have exception cause) > Merging arm-current/fixes (e17b1af96b2a ARM: 8857/1: efi: enable CP15 DMB > instructions before cleaning the cache) > Merging arm64-fixes/for-next/fixes (7a0a93c51799 arm64: vdso: Explicitly add > build-id option) > Merging m68k-current/for-linus (fdd20ec8786a Documentation/features/time: > Mark m68k having modern-timekeeping) > Merging powerpc-fixes/fixes (672eaf37db9f powerpc/cacheinfo: Remove double > free) > Merging sparc/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of > git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux) > Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) > Merging net/master (fa2c52be7129 vlan: Mark expected switch fall-through) > Merging bpf/master (6a0a923dfa14 of_net: fix of_get_mac_address retval if > compiled without CONFIG_OF) > Merging ipsec/master (9b3040a6aafd ipv4: Define __ipv4_neigh_lookup_noref > when CONFIG_INET is disabled) > Merging netfilter/master (2c82c7e724ff netfilter: nf_tables: fix oops during > rule dump) > Merging ipvs/master (b2e3d68d1251 netfilter: nft_compat: destroy function > must not have side effects) > Merging wireless-drivers/master (7a0f8ad5ff63 Merge ath-current from > git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git) > Merging mac80211/master (933b40530b4b ma
RE: [PATCH v4] clk: qoriq: add support for lx2160a
Hello Stephen, I have incorporated review comments from https://patchwork.kernel.org/patch/10917171/ A gentle reminder to apply the patch https://patchwork.kernel.org/patch/10918407/. Regards, Vabhav > -Original Message- > From: Vabhav Sharma > Sent: Friday, April 26, 2019 12:24 PM > To: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org > Cc: sb...@kernel.org; mturque...@baylibre.com; Vabhav Sharma > ; Andy Tang ; Yogesh > Narayan Gaur > Subject: [PATCH v4] clk: qoriq: add support for lx2160a > > Add clockgen support and configuration for NXP SoC lx2160a with > compatible property as "fsl,lx2160a-clockgen". > > Signed-off-by: Tang Yuantian > Signed-off-by: Yogesh Gaur > Signed-off-by: Vabhav Sharma > Acked-by: Scott Wood > Acked-by: Stephen Boyd > Acked-by: Viresh Kumar > --- > Changes for v4: > - Incorporated review comments from Stephen Boyd > > Changes for v3: > - Incorporated review comments of Rafael J. Wysocki > - Updated commit message > > Changes for v2: > - Subject line updated > > drivers/clk/clk-qoriq.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/drivers/clk/clk-qoriq.c b/drivers/clk/clk-qoriq.c index > 3d51d7c..1a15201 100644 > --- a/drivers/clk/clk-qoriq.c > +++ b/drivers/clk/clk-qoriq.c > @@ -570,6 +570,17 @@ static const struct clockgen_chipinfo chipinfo[] = { > .flags = CG_VER3 | CG_LITTLE_ENDIAN, > }, > { > + .compat = "fsl,lx2160a-clockgen", > + .cmux_groups = { > + _cmux_cga12, _cmux_cgb > + }, > + .cmux_to_group = { > + 0, 0, 0, 0, 1, 1, 1, 1, -1 > + }, > + .pll_mask = 0x37, > + .flags = CG_VER3 | CG_LITTLE_ENDIAN, > + }, > + { > .compat = "fsl,p2041-clockgen", > .guts_compat = "fsl,qoriq-device-config-1.0", > .init_periph = p2041_init_periph, > @@ -1427,6 +1438,7 @@ CLK_OF_DECLARE(qoriq_clockgen_ls1043a, > "fsl,ls1043a-clockgen", clockgen_init); > CLK_OF_DECLARE(qoriq_clockgen_ls1046a, "fsl,ls1046a-clockgen", > clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls1088a, "fsl,ls1088a- > clockgen", clockgen_init); CLK_OF_DECLARE(qoriq_clockgen_ls2080a, > "fsl,ls2080a-clockgen", clockgen_init); > +CLK_OF_DECLARE(qoriq_clockgen_lx2160a, "fsl,lx2160a-clockgen", > +clockgen_init); > CLK_OF_DECLARE(qoriq_clockgen_p2041, "fsl,p2041-clockgen", clockgen_init); > CLK_OF_DECLARE(qoriq_clockgen_p3041, "fsl,p3041-clockgen", clockgen_init); > CLK_OF_DECLARE(qoriq_clockgen_p4080, "fsl,p4080-clockgen", clockgen_init); > -- > 2.7.4
RE: Re: Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm
> -Original Message- > From: Vinod Koul > Sent: 2019年5月21日 13:13 > > On 21-05-19, 04:58, Robin Gong wrote: > > > -Original Message- > > > From: Vinod Koul > > > Sent: 2019年5月21日 12:18 > > > > > > On 07-05-19, 09:16, Robin Gong wrote: > > > > Because the number of ecspi1 rx event on i.mx8mm is 0, the > > > > condition check ignore such special case without dma channel > > > > enabled, which caused > > > > ecspi1 rx works failed. Actually, no need to check event_id0, > > > > checking > > > > event_id1 is enough for DEV_2_DEV case because it's so lucky that > > > > event_id1 never be 0. > > > > > > Well is that by chance or design that event_id1 will be never 0? > > > > > That's by chance. DEV_2_DEV is just for Audio case and non-zero for > event_id1 on current i.MX family. > > Then it wont be fgood to rely on chance :) Yes, I knew that. May I create another independent patch for event_id1 since that's potential issue is not related with this ecspi patch set? > > -- > ~Vinod
Re: [PATCHv1 4/8] arm64: dts: qcom: msm8916: Use more generic idle state names
On Wed, May 15, 2019 at 6:33 PM Niklas Cassel wrote: > > On Wed, May 15, 2019 at 03:43:19PM +0530, Amit Kucheria wrote: > > On Tue, May 14, 2019 at 9:42 PM Niklas Cassel > > wrote: > > > > > > On Fri, May 10, 2019 at 04:59:42PM +0530, Amit Kucheria wrote: > > > > Instead of using Qualcomm-specific terminology, use generic node names > > > > for the idle states that are easier to understand. Move the description > > > > into the "idle-state-name" property. > > > > > > > > Signed-off-by: Amit Kucheria > > > > --- > > > > arch/arm64/boot/dts/qcom/msm8916.dtsi | 11 ++- > > > > 1 file changed, 6 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/arch/arm64/boot/dts/qcom/msm8916.dtsi > > > > b/arch/arm64/boot/dts/qcom/msm8916.dtsi > > > > index ded1052e5693..400b609bb3fd 100644 > > > > --- a/arch/arm64/boot/dts/qcom/msm8916.dtsi > > > > +++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi > > > > @@ -110,7 +110,7 @@ > > > > reg = <0x0>; > > > > next-level-cache = <_0>; > > > > enable-method = "psci"; > > > > - cpu-idle-states = <_SPC>; > > > > + cpu-idle-states = <_SLEEP_0>; > > > > clocks = <>; > > > > operating-points-v2 = <_opp_table>; > > > > #cooling-cells = <2>; > > > > @@ -122,7 +122,7 @@ > > > > reg = <0x1>; > > > > next-level-cache = <_0>; > > > > enable-method = "psci"; > > > > - cpu-idle-states = <_SPC>; > > > > + cpu-idle-states = <_SLEEP_0>; > > > > clocks = <>; > > > > operating-points-v2 = <_opp_table>; > > > > #cooling-cells = <2>; > > > > @@ -134,7 +134,7 @@ > > > > reg = <0x2>; > > > > next-level-cache = <_0>; > > > > enable-method = "psci"; > > > > - cpu-idle-states = <_SPC>; > > > > + cpu-idle-states = <_SLEEP_0>; > > > > clocks = <>; > > > > operating-points-v2 = <_opp_table>; > > > > #cooling-cells = <2>; > > > > @@ -146,7 +146,7 @@ > > > > reg = <0x3>; > > > > next-level-cache = <_0>; > > > > enable-method = "psci"; > > > > - cpu-idle-states = <_SPC>; > > > > + cpu-idle-states = <_SLEEP_0>; > > > > clocks = <>; > > > > operating-points-v2 = <_opp_table>; > > > > #cooling-cells = <2>; > > > > @@ -160,8 +160,9 @@ > > > > idle-states { > > > > entry-method="psci"; > > > > > > Please add a space before and after "=". > > > > > > > > > > > - CPU_SPC: spc { > > > > + CPU_SLEEP_0: cpu-sleep-0 { > > > > > > While I like your idea of using power state names from > > > Server Base System Architecture document (SBSA) where applicable, > > > does each qcom power state have a matching state in SBSA? > > > > > > These are the qcom power states: > > > https://source.codeaurora.org/quic/la/kernel/msm-4.4/tree/Documentation/devicetree/bindings/arm/msm/lpm-levels.txt?h=msm-4.4#n53 > > > > > > Note that qcom defines: > > > "wfi", "retention", "gdhs", "pc", "fpc" > > > while SBSA simply defines "idle_standby" (aka wfi), "idle_retention", > > > "sleep". > > > > > > Unless you know the equivalent name for each qcom power state > > > (perhaps several qcom power states are really the same SBSA state?), > > > I think that you should omit the renaming from this patch series. > > > > That is what SLEEP_0, SLEEP_1, SLEEP_2 could be used for. > > Ok, sounds good to me. > > > > > IOW, all these qcom definitions are nicely represented in the > > state-name and we could simply stick to SLEEP_0, SLEEP_1 for the node > > names. There is wide variability in the the names of the qcom idle > > states across SoC families downstream, so I'd argue against using > > those for the node names. > > > > Just for cpu states (non-wfi) I see the use of the following names > > downstream across families. The C seems to come from x86 > > world[1]: > > > > - C4, standalone power collapse (spc) > > - C4, power collapse (fpc) > > - C2D, retention > > - C3, power collapse (pc) > > - C4, rail power collapse (rail-pc) > > > > [1] > > https://www.hardwaresecrets.com/everything-you-need-to-know-about-the-cpu-c-states-power-saving-modes/ > > Indeed, there seems to be mixed names used, I've also seen "fpc-def". > > So, you have convinced me. > > > Kind regards, > Niklas Can I take that as a Reviewed-by?
linux-next: Signed-off-by missing for commit in the amlogic tree
Hi all, Commit 5d32a77c6e2e ("arm64: dts: meson-g12a: Add PWM nodes") is missing a Signed-off-by from its committer. -- Cheers, Stephen Rothwell pgpBisrW4v1Uo.pgp Description: OpenPGP digital signature
Re: [RFC PATCH v2 0/4] Input: mpr121-polled: Add polled driver for MPR121
Hi Michal, On Fri, May 17, 2019 at 03:12:49PM +0200, Michal Vokáč wrote: > Hi, > > I have to deal with a situation where we have a custom i.MX6 based > platform in production that uses the MPR121 touchkey controller. > Unfortunately the chip is connected using only the I2C interface. > The interrupt line is not used. Back in 2015 (Linux v3.14), my > colleague modded the existing mpr121_touchkey.c driver to use polling > instead of interrupt. > > For quite some time yet I am in a process of updating the product from > the ancient Freescale v3.14 kernel to the latest mainline and pushing > any needed changes upstream. The DT files for our imx6dl-yapp4 platform > already made it into v5.1-rc. > > I rebased and updated our mpr121 patch to the latest mainline. > It is created as a separate driver, similarly to gpio_keys_polled. > > The I2C device is quite susceptible to ESD. An ESD test quite often > causes reset of the chip or some register randomly changes its value. > The [PATCH 3/4] adds a write-through register cache. With the cache > this state can be detected and the device can be re-initialied. > > The main question is: Is there any chance that such a polled driver > could be accepted? Is it correct to implement it as a separate driver > or should it be done as an option in the existing driver? I can not > really imagine how I would do that though.. > > There are also certain worries that the MPR121 chip may no longer be > available in nonspecifically distant future. In case of EOL I will need > to add a polled driver for an other touchkey chip. May it be already > in mainline or a completely new one. I think that my addition of input_polled_dev was ultimately a wrong thing to do. I am looking into enabling polling mode for regular input devices as we then can enable polling mode in existing drivers. As far as gpio-keys vs gpio-key-polled, I feel that the capabilities of polling driver is sufficiently different from interrupt-driven one, so we will likely keep them separate. Thanks. -- Dmitry
Re: [PATCH 1/2] Input: atmel_mxt_ts - add wakeup support
On Sat, May 18, 2019 at 06:55:10PM +0200, stefano.ma...@gmail.com wrote: > Hi Dmitry, > > On Fri, 2019-05-17 at 14:30 -0700, Dmitry Torokhov wrote: > > Hi Sefano, > > > > On Fri, May 17, 2019 at 11:17:40PM +0200, Stefano Manni wrote: > > > Add wakeup support to the maxtouch driver. > > > The device can wake up the system from suspend, > > > mark the IRQ as wakeup capable, so that device > > > irq is not disabled during system suspend. > > > > This should already be handled by I2C core, see lines after "if > > (client->flags & I2C_CLIENT_WAKE)" in drivers/i2c/i2c-core-base.c. > > > > Unless there is dedicated wakeup interrupt we configure main > > interrupt > > as wake source. > > > > what's about the other drivers (e.g. ili210x.c) doing like this? > Shall they be purged? They were likely done before I2C and driver core were enhanced to handle wakeup automatically. We might want to clean them up, as long as we verify that they keep working. Thanks. -- Dmitry
Re: [PATCH v2] edac: sifive: Add EDAC platform driver for SiFive SoCs
On Mon, May 6, 2019 at 4:57 PM Yash Shah wrote: > > The initial ver of EDAC driver supports: > - ECC event monitoring and reporting through the EDAC framework for SiFive > L2 cache controller. > > The EDAC driver registers for notifier events from the L2 cache controller > driver (arch/riscv/mm/sifive_l2_cache.c) for L2 ECC events > > Signed-off-by: Yash Shah > Reviewed-by: James Morse > --- > This patch depends on patch > 'RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs' > https://lkml.org/lkml/2019/5/6/255 The prerequisite patch (sifive_l2_cache driver) has been merged into mainline v5.2-rc1 It should be OK to merge this edac driver now. - Yash
Re: [PATCH 0/5] firmware: Add support for loading compressed files
On Mon, 20 May 2019 11:56:07 +0200, Takashi Iwai wrote: > > On Mon, 20 May 2019 11:39:29 +0200, > Greg Kroah-Hartman wrote: > > > > On Mon, May 20, 2019 at 11:26:42AM +0200, Takashi Iwai wrote: > > > Hi, > > > > > > this is a patch set to add the support for loading compressed firmware > > > files. > > > > > > The primary motivation is to reduce the storage size; e.g. currently > > > the amount of /lib/firmware on my machine counts up to 419MB, and this > > > can be reduced to 130MB file compression. No bad deal. > > > > > > The feature adds only fallback to the compressed file, so it should > > > work as it was as long as the normal firmware file is present. The > > > f/w loader decompresses the content, so that there is no change needed > > > in the caller side. > > > > > > Currently only XZ format is supported. A caveat is that the kernel XZ > > > helper code supports only CRC32 (or none) integrity check type, so > > > you'll have to compress the files via xz -C crc32 option. > > > > > > The patch set begins with a few other improvements and refactoring, > > > followed by the compression support. > > > > > > In addition to this, dracut needs a small fix to deal with the *.xz > > > files. > > > > > > Also, the latest patchset is found in topic/fw-decompress branch of my > > > sound.git tree: > > > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git > > > > After a quick review, these all look good to me, nice job. > > > > One recommendation, can we add support for testing this to the > > tools/testing/selftests/firmware/ tests? And you did run those > > regression tests to verify that you didn't get any of the config options > > messed up, right? :) > > Oh, do you believe I'm a so modern person who lets computer working on > everything? ;) I only tested manually, so far, this will be my > homework today. After fixing the regression in kselftest, I could verify and confirm that no regression was introduced by my patchset. Also, below is the patch to add tests for the compressed firmware load. I'll add to the series at the next respin, if needed. thanks, Takashi -- 8< -- From: Takashi Iwai Subject: [PATCH] selftests: firmware: Add compressed firmware tests This patch adds the test cases for checking compressed firmware load. Two more cases are added to fw_filesystem.sh: - Both a plain file and an xz file are present, and load the former - Only an xz file is present, and load without '.xz' suffix The tests are enabled only when CONFIG_FW_LOADER_COMPRESS is enabled and xz program is installed. Signed-off-by: Takashi Iwai --- tools/testing/selftests/firmware/fw_filesystem.sh | 73 +++ tools/testing/selftests/firmware/fw_lib.sh| 7 +++ tools/testing/selftests/firmware/fw_run_tests.sh | 1 + 3 files changed, 71 insertions(+), 10 deletions(-) diff --git a/tools/testing/selftests/firmware/fw_filesystem.sh b/tools/testing/selftests/firmware/fw_filesystem.sh index a4320c4b44dc..f901076aa2ea 100755 --- a/tools/testing/selftests/firmware/fw_filesystem.sh +++ b/tools/testing/selftests/firmware/fw_filesystem.sh @@ -153,13 +153,18 @@ config_set_read_fw_idx() read_firmwares() { + if [ "$1" = "xzonly" ]; then + fwfile="${FW}-orig" + else + fwfile="$FW" + fi for i in $(seq 0 3); do config_set_read_fw_idx $i # Verify the contents are what we expect. # -Z required for now -- check for yourself, md5sum # on $FW and DIR/read_firmware will yield the same. Even # cmp agrees, so something is off. - if ! diff -q -Z "$FW" $DIR/read_firmware 2>/dev/null ; then + if ! diff -q -Z "$fwfile" $DIR/read_firmware 2>/dev/null ; then echo "request #$i: firmware was not loaded" >&2 exit 1 fi @@ -246,17 +251,17 @@ test_request_firmware_nowait_custom_nofile() test_batched_request_firmware() { - echo -n "Batched request_firmware() try #$1: " + echo -n "Batched request_firmware() $2 try #$1: " config_reset config_trigger_sync - read_firmwares + read_firmwares $2 release_all_firmware echo "OK" } test_batched_request_firmware_direct() { - echo -n "Batched request_firmware_direct() try #$1: " + echo -n "Batched request_firmware_direct() $2 try #$1: " config_reset config_set_sync_direct config_trigger_sync @@ -266,7 +271,7 @@ test_batched_request_firmware_direct() test_request_firmware_nowait_uevent() { - echo -n "Batched request_firmware_nowait(uevent=true) try #$1: " + echo -n "Batched request_firmware_nowait(uevent=true) $2 try #$1: " config_reset config_trigger_async release_all_firmware @@ -275,11 +280,16 @@ test_request_firmware_nowait_uevent() test_request_firmware_nowait_custom() { - echo -n
Re: [RESEND] input: keyboard: imx: make sure keyboard can always wake up system
Hi Anson, On Thu, Apr 04, 2019 at 01:40:16AM +, Anson Huang wrote: > There are several scenarios that keyboard can NOT wake up system > from suspend, e.g., if a keyboard is depressed between system > device suspend phase and device noirq suspend phase, the keyboard > ISR will be called and both keyboard depress and release interrupts > will be disabled, then keyboard will no longer be able to wake up > system. Another scenario would be, if a keyboard is kept depressed, > and then system goes into suspend, the expected behavior would be > when keyboard is released, system will be waked up, but current > implementation can NOT achieve that, because both depress and release > interrupts are disabled in ISR, and the event check is still in > progress. > > To fix these issues, need to make sure keyboard's depress or release > interrupt is enabled after noirq device suspend phase, this patch > moves the suspend/resume callback to noirq suspend/resume phase, and > enable the corresponding interrupt according to current keyboard status. I believe it is possible for IRQ to be disabled and still being enabled as wakeup source. What happens if you call disable_irq() before disabling the clock? Thanks. -- Dmitry
Re: [PATCH 1/2] selftests: Remove forced unbuffering for test running
On Tue, 21 May 2019 00:37:48 +0200, Kees Cook wrote: > > As it turns out, the "stdbuf" command will actually force all > subprocesses into unbuffered output, and some implementations of "echo" > turn into single-character writes, which utterly wrecks writes to /sys > and /proc files. > > Instead, drop the "stdbuf" usage, and for any tests that want explicit > flushing between newlines, they'll have to add "fflush(stdout);" as > needed. > > Reported-by: Takashi Iwai > Fixes: 5c069b6dedef ("selftests: Move test output to diagnostic lines") > Signed-off-by: Kees Cook Tested-by: Takashi Iwai BTW, this might be specific to shell invocation. As in the original discussion thread, it starts working when I replace "echo" with "/usr/bin/echo". Still it's not easy to control in a script itself, so dropping the unbuffered mode is certainly safer, yes. Thanks! Takashi > --- > tools/testing/selftests/kselftest/runner.sh | 12 +--- > 1 file changed, 1 insertion(+), 11 deletions(-) > > diff --git a/tools/testing/selftests/kselftest/runner.sh > b/tools/testing/selftests/kselftest/runner.sh > index eff3ee303d0d..00c9020bdda8 100644 > --- a/tools/testing/selftests/kselftest/runner.sh > +++ b/tools/testing/selftests/kselftest/runner.sh > @@ -24,16 +24,6 @@ tap_prefix() > fi > } > > -# If stdbuf is unavailable, we must fall back to line-at-a-time piping. > -tap_unbuffer() > -{ > - if ! which stdbuf >/dev/null ; then > - "$@" > - else > - stdbuf -i0 -o0 -e0 "$@" > - fi > -} > - > run_one() > { > DIR="$1" > @@ -54,7 +44,7 @@ run_one() > echo "not ok $test_num $TEST_HDR_MSG" > else > cd `dirname $TEST` > /dev/null > - (tap_unbuffer ./$BASENAME_TEST 2>&1; echo $? >&3) | > + (./$BASENAME_TEST 2>&1; echo $? >&3) | > tap_prefix >&4) 3>&1) | > (read xs; exit $xs)) 4>>"$logfile" && > echo "ok $test_num $TEST_HDR_MSG") || > -- > 2.17.1 >
Re: [PATCH v2 1/9] media: ov6650: Fix MODDULE_DESCRIPTION
On Tue, May 21, 2019 at 12:49:59AM +0200, Janusz Krzysztofik wrote: > Commit 23a52386fabe ("media: ov6650: convert to standalone v4l2 > subdevice") converted the driver from a soc_camera sensor to a > standalone V4L subdevice driver. Unfortunately, module description was > not updated to reflect the change. Fix it. > > While being at it, update email address of the module author. > > Fixes: 23a52386fabe ("media: ov6650: convert to standalone v4l2 subdevice") > Signed-off-by: Janusz Krzysztofik > cc: sta...@vger.kernel.org > --- > drivers/media/i2c/ov6650.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/media/i2c/ov6650.c b/drivers/media/i2c/ov6650.c > index 1b972e591b48..a3d00afcb0c8 100644 > --- a/drivers/media/i2c/ov6650.c > +++ b/drivers/media/i2c/ov6650.c > @@ -1045,6 +1045,6 @@ static struct i2c_driver ov6650_i2c_driver = { > > module_i2c_driver(ov6650_i2c_driver); > > -MODULE_DESCRIPTION("SoC Camera driver for OmniVision OV6650"); > -MODULE_AUTHOR("Janusz Krzysztofik "); > +MODULE_DESCRIPTION("V4L2 subdevice driver for OmniVision OV6650 camera > sensor"); > +MODULE_AUTHOR("Janusz Krzysztofik MODULE_LICENSE("GPL v2"); > -- > 2.21.0 > is this _really_ a patch that meets the stable kernel requirements? Same for this whole series... thanks, greg k-h
Re: [PATCH 11/12] powerpc/pseries/svm: Force SWIOTLB for secure guests
> diff --git a/arch/powerpc/include/asm/mem_encrypt.h > b/arch/powerpc/include/asm/mem_encrypt.h > new file mode 100644 > index ..45d5e4d0e6e0 > --- /dev/null > +++ b/arch/powerpc/include/asm/mem_encrypt.h > @@ -0,0 +1,19 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * SVM helper functions > + * > + * Copyright 2019 IBM Corporation > + */ > + > +#ifndef _ASM_POWERPC_MEM_ENCRYPT_H > +#define _ASM_POWERPC_MEM_ENCRYPT_H > + > +#define sme_me_mask 0ULL > + > +static inline bool sme_active(void) { return false; } > +static inline bool sev_active(void) { return false; } > + > +int set_memory_encrypted(unsigned long addr, int numpages); > +int set_memory_decrypted(unsigned long addr, int numpages); > + > +#endif /* _ASM_POWERPC_MEM_ENCRYPT_H */ S/390 seems to be adding a stub header just like this. Can you please clean up the Kconfig and generic headers bits for memory encryption so that we don't need all this boilerplate code? > config PPC_SVM > bool "Secure virtual machine (SVM) support for POWER" > depends on PPC_PSERIES > + select SWIOTLB > + select ARCH_HAS_MEM_ENCRYPT > default n n is the default default, no need to explictly specify it.
Re: [PATCH] input: imx6ul_tsc: use devm_platform_ioremap_resource() to simplify code
On Mon, Apr 01, 2019 at 05:19:55AM +, Anson Huang wrote: > Use the new helper devm_platform_ioremap_resource() which wraps the > platform_get_resource() and devm_ioremap_resource() together, to > simplify the code. > > Signed-off-by: Anson Huang Applied, thank you. > --- > drivers/input/touchscreen/imx6ul_tsc.c | 8 ++-- > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/input/touchscreen/imx6ul_tsc.c > b/drivers/input/touchscreen/imx6ul_tsc.c > index c10fc59..e04eecd 100644 > --- a/drivers/input/touchscreen/imx6ul_tsc.c > +++ b/drivers/input/touchscreen/imx6ul_tsc.c > @@ -364,8 +364,6 @@ static int imx6ul_tsc_probe(struct platform_device *pdev) > struct device_node *np = pdev->dev.of_node; > struct imx6ul_tsc *tsc; > struct input_dev *input_dev; > - struct resource *tsc_mem; > - struct resource *adc_mem; > int err; > int tsc_irq; > int adc_irq; > @@ -403,16 +401,14 @@ static int imx6ul_tsc_probe(struct platform_device > *pdev) > return err; > } > > - tsc_mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); > - tsc->tsc_regs = devm_ioremap_resource(>dev, tsc_mem); > + tsc->tsc_regs = devm_platform_ioremap_resource(pdev, 0); > if (IS_ERR(tsc->tsc_regs)) { > err = PTR_ERR(tsc->tsc_regs); > dev_err(>dev, "failed to remap tsc memory: %d\n", err); > return err; > } > > - adc_mem = platform_get_resource(pdev, IORESOURCE_MEM, 1); > - tsc->adc_regs = devm_ioremap_resource(>dev, adc_mem); > + tsc->adc_regs = devm_platform_ioremap_resource(pdev, 1); > if (IS_ERR(tsc->adc_regs)) { > err = PTR_ERR(tsc->adc_regs); > dev_err(>dev, "failed to remap adc memory: %d\n", err); > -- > 2.7.4 > -- Dmitry
Re: [RFC PATCH 02/12] powerpc: Add support for adding an ESM blob to the zImage wrapper
On Tue, May 21, 2019 at 01:49:02AM -0300, Thiago Jung Bauermann wrote: > From: Benjamin Herrenschmidt > > For secure VMs, the signing tool will create a ticket called the "ESM blob" > for the Enter Secure Mode ultravisor call with the signatures of the kernel > and initrd among other things. > > This adds support to the wrapper script for adding that blob via the "-e" > option to the zImage.pseries. > > It also adds code to the zImage wrapper itself to retrieve and if necessary > relocate the blob, and pass its address to Linux via the device-tree, to be > later consumed by prom_init. Where does the "BLOB" come from? How is it licensed and how can we satisfy the GPL with it?
Re: [RFC 1/1] Add dm verity root hash pkcs7 sig validation.
On 5/21/19 7:54 AM, Jaskaran Khurana wrote: > Adds in-kernel pkcs7 signature checking for the roothash of > the dm-verity hash tree. > > The verification is to support cases where the roothash is not secured by > Trusted Boot, UEFI Secureboot or similar technologies. > One of the use cases for this is for dm-verity volumes mounted after boot, > the root hash provided during the creation of the dm-verity volume has to > be secure and thus in-kernel validation implemented here will be used > before we trust the root hash and allow the block device to be created. > The first patch was your cover letter, I'd suggest name it that way in the subject. > The signature being provided for verification must verify the root hash and > must be trusted by the builtin keyring for verification to succeed. > > Adds DM_VERITY_VERIFY_ROOTHASH_SIG: roothash verification > against the roothash signature file *if* specified, if signature file is > specified verification must succeed prior to creation of device mapper > block device. > > Adds DM_VERITY_VERIFY_ROOTHASH_SIG_FORCE: roothash signature *must* be > specified for all dm verity volumes and verification must succeed prior > to creation of device mapper block device. > > Signed-off-by: Jaskaran Khurana > --- > drivers/md/Kconfig| 23 ++ > drivers/md/Makefile | 2 +- > drivers/md/dm-verity-target.c | 44 -- > drivers/md/dm-verity-verify-sig.c | 129 ++ > drivers/md/dm-verity-verify-sig.h | 32 > 5 files changed, 222 insertions(+), 8 deletions(-) > create mode 100644 drivers/md/dm-verity-verify-sig.c > create mode 100644 drivers/md/dm-verity-verify-sig.h > > diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig > index db269a348b20..da4115753f25 100644 > --- a/drivers/md/Kconfig > +++ b/drivers/md/Kconfig > @@ -489,6 +489,29 @@ config DM_VERITY > > If unsure, say N. > > +config DM_VERITY_VERIFY_ROOTHASH_SIG > + def_bool n > + bool "Verity data device root hash signature verification support" > + depends on DM_VERITY > + select SYSTEM_DATA_VERIFICATION > + help > + The device mapper target created by DM-VERITY can be validated if the > + pre-generated tree of cryptographic checksums passed has a pkcs#7 > + signature file that can validate the roothash of the tree. > + > + If unsure, say N. > + > +config DM_VERITY_VERIFY_ROOTHASH_SIG_FORCE > + def_bool n > + bool "Forces all dm verity data device root hash should be signed" > + depends on DM_VERITY_VERIFY_ROOTHASH_SIG > + help > + The device mapper target created by DM-VERITY will succeed only if the > + pre-generated tree of cryptographic checksums passed also has a pkcs#7 > + signature file that can validate the roothash of the tree. > + > + If unsure, say N. > + > config DM_VERITY_FEC > bool "Verity forward error correction support" > depends on DM_VERITY > diff --git a/drivers/md/Makefile b/drivers/md/Makefile > index be7a6eb92abc..8a8c142bcfe1 100644 > --- a/drivers/md/Makefile > +++ b/drivers/md/Makefile > @@ -61,7 +61,7 @@ obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o > obj-$(CONFIG_DM_ZERO)+= dm-zero.o > obj-$(CONFIG_DM_RAID)+= dm-raid.o > obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o > -obj-$(CONFIG_DM_VERITY) += dm-verity.o > +obj-$(CONFIG_DM_VERITY) += dm-verity.o dm-verity-verify-sig.o > obj-$(CONFIG_DM_CACHE) += dm-cache.o > obj-$(CONFIG_DM_CACHE_SMQ) += dm-cache-smq.o > obj-$(CONFIG_DM_ERA) += dm-era.o > diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c > index f4c31ffaa88e..53aebfa8bc38 100644 > --- a/drivers/md/dm-verity-target.c > +++ b/drivers/md/dm-verity-target.c > @@ -16,7 +16,7 @@ > > #include "dm-verity.h" > #include "dm-verity-fec.h" > - > +#include "dm-verity-verify-sig.h" > #include > #include > > @@ -34,7 +34,11 @@ > #define DM_VERITY_OPT_IGN_ZEROES "ignore_zero_blocks" > #define DM_VERITY_OPT_AT_MOST_ONCE "check_at_most_once" > > -#define DM_VERITY_OPTS_MAX (2 + DM_VERITY_OPTS_FEC) > +#define DM_VERITY_OPTS_MAX (2 + DM_VERITY_OPTS_FEC + \ > + DM_VERITY_ROOT_HASH_VERIFICATION_OPTS) > + > +#define DM_VERITY_MANDATORY_ARGS10 > + > > static unsigned dm_verity_prefetch_cluster = DM_VERITY_DEFAULT_PREFETCH_SIZE; > > @@ -855,7 +859,8 @@ static int verity_alloc_zero_digest(struct dm_verity *v) > return r; > } > > -static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v) > +static int verity_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v, > + struct dm_verity_sig_opts *verify_args) > { > int r; > unsigned argc; > @@ -904,6 +909,15 @@ static int verity_parse_opt_args(struct dm_arg_set *as, > struct
Re: [RFC 0/7] introduce memory hinting API for external process
On Tue, May 21, 2019 at 08:25:55AM +0530, Anshuman Khandual wrote: > > > On 05/20/2019 10:29 PM, Tim Murray wrote: > > On Sun, May 19, 2019 at 11:37 PM Anshuman Khandual > > wrote: > >> > >> Or Is the objective here is reduce the number of processes which get > >> killed by > >> lmkd by triggering swapping for the unused memory (user hinted) sooner so > >> that > >> they dont get picked by lmkd. Under utilization for zram hardware is a > >> concern > >> here as well ? > > > > The objective is to avoid some instances of memory pressure by > > proactively swapping pages that userspace knows to be cold before > > those pages reach the end of the LRUs, which in turn can prevent some > > apps from being killed by lmk/lmkd. As soon as Android userspace knows > > that an application is not being used and is only resident to improve > > performance if the user returns to that app, we can kick off > > process_madvise on that process's pages (or some portion of those > > pages) in a power-efficient way to reduce memory pressure long before > > the system hits the free page watermark. This allows the system more > > time to put pages into zram versus waiting for the watermark to > > trigger kswapd, which decreases the likelihood that later memory > > allocations will cause enough pressure to trigger a kill of one of > > these apps. > > So this opens up bit of LRU management to user space hints. Also because the > app > in itself wont know about the memory situation of the entire system, new > system > call needs to be called from an external process. That's why process_madvise is introduced here. > > > > >> Swapping out memory into zram wont increase the latency for a hot start ? > >> Or > >> is it because as it will prevent a fresh cold start which anyway will be > >> slower > >> than a slow hot start. Just being curious. > > > > First, not all swapped pages will be reloaded immediately once an app > > is resumed. We've found that an app's working set post-process_madvise > > is significantly smaller than what an app allocates when it first > > launches (see the delta between pswpin and pswpout in Minchan's > > results). Presumably because of this, faulting to fetch from zram does > > pswpin 4176131392647 975034 233.00 > pswpout127422426617311387507 108.00 > > IIUC the swap-in ratio is way higher in comparison to that of swap out. Is > that > always the case ? Or it tend to swap out from an active area of the working > set > which faulted back again. I think it's because apps are alive longer via reducing being killed so turn into from pgpgin to swapin. > > > not seem to introduce a noticeable hot start penalty, not does it > > cause an increase in performance problems later in the app's > > lifecycle. I've measured with and without process_madvise, and the > > differences are within our noise bounds. Second, because we're not > > That is assuming that post process_madvise() working set for the application > is > always smaller. There is another challenge. The external process should > ideally > have the knowledge of active areas of the working set for an application in > question for it to invoke process_madvise() correctly to prevent such > scenarios. There are several ways to detect workingset more accurately at the cost of runtime. For example, with idle page tracking or clear_refs. Accuracy is always trade-off of overhead for LRU aging. > > > preemptively evicting file pages and only making them more likely to > > be evicted when there's already memory pressure, we avoid the case > > where we process_madvise an app then immediately return to the app and > > reload all file pages in the working set even though there was no > > intervening memory pressure. Our initial version of this work evicted > > That would be the worst case scenario which should be avoided. Memory pressure > must be a parameter before actually doing the swap out. But pages if know to > be > inactive/cold can be marked high priority to be swapped out. > > > file pages preemptively and did cause a noticeable slowdown (~15%) for > > that case; this patch set avoids that slowdown. Finally, the benefit > > from avoiding cold starts is huge. The performance improvement from > > having a hot start instead of a cold start ranges from 3x for very > > small apps to 50x+ for larger apps like high-fidelity games. > > Is there any other real world scenario apart from this app based ecosystem > where > user hinted LRU management might be helpful ? Just being curious. Thanks for > the > detailed explanation. I will continue looking into this series.
linux-next: Tree for May 21
Hi all, Changes since 20190520: New trees: soc-fsl, soc-fsl-fixes Removed trees: (not updated for more than a year) alpine, samsung, sh, befs, kconfig, dwmw2-iommu, trivial, target-updates, target-bva, init_task The imx-mxs tree gained a build failure so I used the version from next-20190520. The sunxi tree gained a conflict against the imx-mxs tree. The drm-misc tree gained conflicts against Linis' and the amdgpu trees. Non-merge commits (relative to Linus' tree): 991 998 files changed, 29912 insertions(+), 14691 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 290 trees (counting Linus' and 70 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux) Merging fixes/master (2bbacd1a9278 Merge tag 'kconfig-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kspp-gustavo/for-next/kspp (b324f1b28dc0 afs: yfsclient: Mark expected switch fall-throughs) Merging kbuild-current/fixes (a2d635decbfa Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm) Merging arc-current/for-curr (c5a1726d7383 ARC: entry: EV_Trap expects r10 (vs. r9) to have exception cause) Merging arm-current/fixes (e17b1af96b2a ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache) Merging arm64-fixes/for-next/fixes (7a0a93c51799 arm64: vdso: Explicitly add build-id option) Merging m68k-current/for-linus (fdd20ec8786a Documentation/features/time: Mark m68k having modern-timekeeping) Merging powerpc-fixes/fixes (672eaf37db9f powerpc/cacheinfo: Remove double free) Merging sparc/master (f49aa1de9836 Merge tag 'for-5.2-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (fa2c52be7129 vlan: Mark expected switch fall-through) Merging bpf/master (6a0a923dfa14 of_net: fix of_get_mac_address retval if compiled without CONFIG_OF) Merging ipsec/master (9b3040a6aafd ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled) Merging netfilter/master (2c82c7e724ff netfilter: nf_tables: fix oops during rule dump) Merging ipvs/master (b2e3d68d1251 netfilter: nft_compat: destroy function must not have side effects) Merging wireless-drivers/master (7a0f8ad5ff63 Merge ath-current from git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git) Merging mac80211/master (933b40530b4b mac80211: remove set but not used variable 'old') Merging rdma-fixes/for-rc (2557fabd6e29 RDMA/hns: Bugfix for mapping user db) Merging sound-current/for-linus (c7b55fabfa44 ALSA: hdac: fix memory release for SST and SOF drivers) Merging sound-asoc-fixes/for-linus (08b9e0213aeb Merge branch 'asoc-5.1' into asoc-linus) Merging regmap-fixes/for-linus (1d6106cafb37 Merge branch 'regmap-5.1' into regmap-linus) Merging regulator-fixes/for-linus (0d183fc1760f Merge branch 'regulator-5.1' into regulator-linus) Merging spi-fixes/for-linus (72e3b3285a43 Merge branch 'spi-5.1' into spi-linus) Merging pci-current/for-linus (a188339ca5a3 Linux 5.2-rc1) Merging driver-core.current/driver-core-linus (a188339ca5a3 Linux 5.2-rc1) Merging tty.current/tty-linus (a18833
Re: Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm
On 21-05-19, 04:58, Robin Gong wrote: > > -Original Message- > > From: Vinod Koul > > Sent: 2019年5月21日 12:18 > > > > On 07-05-19, 09:16, Robin Gong wrote: > > > Because the number of ecspi1 rx event on i.mx8mm is 0, the condition > > > check ignore such special case without dma channel enabled, which > > > caused > > > ecspi1 rx works failed. Actually, no need to check event_id0, checking > > > event_id1 is enough for DEV_2_DEV case because it's so lucky that > > > event_id1 never be 0. > > > > Well is that by chance or design that event_id1 will be never 0? > > > That's by chance. DEV_2_DEV is just for Audio case and non-zero for event_id1 > on current i.MX family. Then it wont be fgood to rely on chance :) -- ~Vinod
Re: [PATCH] input: keyboard: imx: use devm_platform_ioremap_resource() to simplify code
On Mon, Apr 01, 2019 at 05:28:12AM +, Anson Huang wrote: > Use the new helper devm_platform_ioremap_resource() which wraps the > platform_get_resource() and devm_ioremap_resource() together, to > simplify the code. > > Signed-off-by: Anson Huang Applied, thank you. > --- > drivers/input/keyboard/imx_keypad.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/drivers/input/keyboard/imx_keypad.c > b/drivers/input/keyboard/imx_keypad.c > index 539cb67..cf08f4a 100644 > --- a/drivers/input/keyboard/imx_keypad.c > +++ b/drivers/input/keyboard/imx_keypad.c > @@ -422,7 +422,6 @@ static int imx_keypad_probe(struct platform_device *pdev) > dev_get_platdata(>dev); > struct imx_keypad *keypad; > struct input_dev *input_dev; > - struct resource *res; > int irq, error, i, row, col; > > if (!keymap_data && !pdev->dev.of_node) { > @@ -455,8 +454,7 @@ static int imx_keypad_probe(struct platform_device *pdev) > timer_setup(>check_matrix_timer, > imx_keypad_check_for_events, 0); > > - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); > - keypad->mmio_base = devm_ioremap_resource(>dev, res); > + keypad->mmio_base = devm_platform_ioremap_resource(pdev, 0); > if (IS_ERR(keypad->mmio_base)) > return PTR_ERR(keypad->mmio_base); > > -- > 2.7.4 > -- Dmitry
Re: [PATCH 1/2] Input: elantech - enable middle button support on 2 ThinkPads
Hi Aaron, On Sun, May 19, 2019 at 03:27:10PM +0800, Aaron Ma wrote: > Adding 2 new touchpad PNPIDs to enable middle button support. Could you add their names in the comments please? > > Cc: sta...@vger.kernel.org > Signed-off-by: Aaron Ma > --- > drivers/input/mouse/elantech.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c > index a7f8b1614559..530142b5a115 100644 > --- a/drivers/input/mouse/elantech.c > +++ b/drivers/input/mouse/elantech.c > @@ -1189,6 +1189,8 @@ static const char * const middle_button_pnp_ids[] = { > "LEN2132", /* ThinkPad P52 */ > "LEN2133", /* ThinkPad P72 w/ NFC */ > "LEN2134", /* ThinkPad P72 */ > + "LEN0407", > + "LEN0408", These should come first - I'd like to keep the list sorted alphabetically. > NULL > }; > > -- > 2.17.1 > Thanks. -- Dmitry
[PATCH v3] kernel: fix typos and some coding style in comments
fix lenght to length Signed-off-by: Weitao Hou --- Changes in v3: - fix all other same typos with git grep --- .../devicetree/bindings/usb/s3c2410-usb.txt| 2 +- .../wireless/mediatek/mt76/mt76x02_usb_core.c | 2 +- kernel/sysctl.c| 18 +- sound/soc/qcom/qdsp6/q6asm.c | 2 +- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/Documentation/devicetree/bindings/usb/s3c2410-usb.txt b/Documentation/devicetree/bindings/usb/s3c2410-usb.txt index e45b38ce2986..26c85afd0b53 100644 --- a/Documentation/devicetree/bindings/usb/s3c2410-usb.txt +++ b/Documentation/devicetree/bindings/usb/s3c2410-usb.txt @@ -4,7 +4,7 @@ OHCI Required properties: - compatible: should be "samsung,s3c2410-ohci" for USB host controller - - reg: address and lenght of the controller memory mapped region + - reg: address and length of the controller memory mapped region - interrupts: interrupt number for the USB OHCI controller - clocks: Should reference the bus and host clocks - clock-names: Should contain two strings diff --git a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c index 6b89f7eab26c..e0f5e6202a27 100644 --- a/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c +++ b/drivers/net/wireless/mediatek/mt76/mt76x02_usb_core.c @@ -53,7 +53,7 @@ int mt76x02u_skb_dma_info(struct sk_buff *skb, int port, u32 flags) pad = round_up(skb->len, 4) + 4 - skb->len; /* First packet of a A-MSDU burst keeps track of the whole burst -* length, need to update lenght of it and the last packet. +* length, need to update length of it and the last packet. */ skb_walk_frags(skb, iter) { last = iter; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 943c89178e3d..f78f725f225e 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -187,17 +187,17 @@ extern int no_unaligned_warning; * enum sysctl_writes_mode - supported sysctl write modes * * @SYSCTL_WRITES_LEGACY: each write syscall must fully contain the sysctl value - * to be written, and multiple writes on the same sysctl file descriptor - * will rewrite the sysctl value, regardless of file position. No warning - * is issued when the initial position is not 0. + * to be written, and multiple writes on the same sysctl file descriptor + * will rewrite the sysctl value, regardless of file position. No warning + * is issued when the initial position is not 0. * @SYSCTL_WRITES_WARN: same as above but warn when the initial file position is - * not 0. + * not 0. * @SYSCTL_WRITES_STRICT: writes to numeric sysctl entries must always be at - * file position 0 and the value must be fully contained in the buffer - * sent to the write syscall. If dealing with strings respect the file - * position, but restrict this to the max length of the buffer, anything - * passed the max lenght will be ignored. Multiple writes will append - * to the buffer. + * file position 0 and the value must be fully contained in the buffer + * sent to the write syscall. If dealing with strings respect the file + * position, but restrict this to the max length of the buffer, anything + * passed the max length will be ignored. Multiple writes will append + * to the buffer. * * These write modes control how current file position affects the behavior of * updating sysctl values through the proc interface on each write. diff --git a/sound/soc/qcom/qdsp6/q6asm.c b/sound/soc/qcom/qdsp6/q6asm.c index 4f85cb19a309..e8141a33a55e 100644 --- a/sound/soc/qcom/qdsp6/q6asm.c +++ b/sound/soc/qcom/qdsp6/q6asm.c @@ -1194,7 +1194,7 @@ EXPORT_SYMBOL_GPL(q6asm_open_read); * q6asm_write_async() - non blocking write * * @ac: audio client pointer - * @len: lenght in bytes + * @len: length in bytes * @msw_ts: timestamp msw * @lsw_ts: timestamp lsw * @wflags: flags associated with write -- 2.18.0
Re: [PATCH 2/2] Input: synaptics - remove X240 from the topbuttonpad list
Hi Aaron, On Sun, May 19, 2019 at 03:27:11PM +0800, Aaron Ma wrote: > Lenovo ThinkPad X240 does not have the top software button. > When this wrong ID in top button list, smbus mode will fail to probe, > so keep it working at PS2 mode. > > Cc: sta...@vger.kernel.org > Signed-off-by: Aaron Ma > --- > drivers/input/mouse/synaptics.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/input/mouse/synaptics.c b/drivers/input/mouse/synaptics.c > index b6da0c1267e3..6ae7bc92476b 100644 > --- a/drivers/input/mouse/synaptics.c > +++ b/drivers/input/mouse/synaptics.c > @@ -140,7 +140,6 @@ static const char * const topbuttonpad_pnp_ids[] = { > "LEN002E", > "LEN0033", /* Helix */ > "LEN0034", /* T431s, L440, L540, T540, W540, X1 Carbon 2nd */ > - "LEN0035", /* X240 */ According to the history this came from Synaptics through Hans, so I'd like to make sure there are no several X240 versions floating around... > "LEN0036", /* T440 */ > "LEN0037", /* X1 Carbon 2nd */ > "LEN0038", > -- > 2.17.1 > Thanks. -- Dmitry
Re: [PATCH V6 02/15] PCI/PME: Export pcie_pme_disable_msi() & pcie_pme_no_msi() APIs
On 5/20/2019 11:27 PM, Bjorn Helgaas wrote: On Sat, May 18, 2019 at 07:28:29AM +0530, Vidya Sagar wrote: On 5/18/2019 12:25 AM, Bjorn Helgaas wrote: On Fri, May 17, 2019 at 11:23:36PM +0530, Vidya Sagar wrote: On 5/17/2019 6:54 PM, Bjorn Helgaas wrote: Do you have "lspci -vvxxx" output for the root ports handy? If there's some clue in the standard config space that would tell us that MSI works for some events but not others, we could make the PCI core pay attention it. That would be the best solution because it wouldn't require Tegra-specific code. Here is the output of 'lspci vvxxx' for one of Tegra194's root ports. Thanks! This port advertises both MSI and MSI-X, and neither one is enabled. This particular port doesn't have a slot, so hotplug isn't applicable to it. But if I understand correctly, if MSI or MSI-X were enabled and the port had a slot, the port would generate MSI/MSI-X hotplug interrupts. But PME and AER events would still cause INTx interrupts (even with MSI or MSI-X enabled). Do I have that right? I just want to make sure that the reason for PME being INTx is a permanent hardware choice and that it's not related to MSI and MSI-X currently being disabled. Yes. Thats right. Its hardware choice that our hardware engineers made to use INTx for PME instead of MSI irrespective of MSI/MSI-X enabled/disabled in the root port. Here are more spec references that seem applicable: - PCIe r4.0, sec 7.7.1.2 (Message Control Register for MSI) says: MSI Enable – If Set and the MSI-X Enable bit in the MSI-X Message Control register (see Section 7.9.2) is Clear, the Function is permitted to use MSI to request service and is prohibited from using INTx interrupts. - PCIe r4.0, sec 7.7.2.2 (Message Control Register for MSI-X) says: MSI-X Enable – If Set and the MSI Enable bit in the MSI Message Control register (see Section 6.8.1.3) is Clear, the Function is permitted to use MSI-X to request service and is prohibited from using INTx interrupts (if implemented). I read that to mean a device is prohibited from using MSI/MSI-X for some interrupts and INTx for others. Since Tegra194 cannot use MSI/MSI-X for PME, it should use INTx for *all* interrupts. That makes the MSI/MSI-X Capabilities superfluous, and they should be omitted. If we set pdev->no_msi for Tegra194, we'll avoid MSI/MSI-X completely, so we'll assume *all* interrupts including hotplug will be INTx. Will that work? Yes. We are fine with having all root port originated interrupts getting generated through INTx instead of MSI/MSI-X.
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 06:44:52PM -0700, Matthew Wilcox wrote: > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > Do we tear down page tables for these ranges? That seems like a good True for MADV_COLD(reclaiming) but false for MADV_COOL(deactivating) at this implementation. > way of reclaiming potentially a substantial amount of memory. Given that consider refauting are spread out over time and reclaim occurs in burst, that does make sense to speed up the reclaiming. However, a concern to me is anonymous pages since they need swap cache insertion, which would be wasteful if they are not reclaimed, finally.
[PATCH 07/12] powerpc/pseries/svm: Use shared memory for Debug Trace Log (DTL)
From: Anshuman Khandual Secure guests need to share the DTL buffers with the hypervisor. To that end, use a kmem_cache constructor which converts the underlying buddy allocated SLUB cache pages into shared memory. Signed-off-by: Anshuman Khandual Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/svm.h | 5 arch/powerpc/platforms/pseries/Makefile | 1 + arch/powerpc/platforms/pseries/setup.c | 5 +++- arch/powerpc/platforms/pseries/svm.c| 40 + 4 files changed, 50 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h index fef3740f46a6..f253116c31fc 100644 --- a/arch/powerpc/include/asm/svm.h +++ b/arch/powerpc/include/asm/svm.h @@ -15,6 +15,9 @@ static inline bool is_secure_guest(void) return mfmsr() & MSR_S; } +void dtl_cache_ctor(void *addr); +#define get_dtl_cache_ctor() (is_secure_guest() ? dtl_cache_ctor : NULL) + #else /* CONFIG_PPC_SVM */ static inline bool is_secure_guest(void) @@ -22,5 +25,7 @@ static inline bool is_secure_guest(void) return false; } +#define get_dtl_cache_ctor() NULL + #endif /* CONFIG_PPC_SVM */ #endif /* _ASM_POWERPC_SVM_H */ diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile index a43ec843c8e2..b7b6e6f52bd0 100644 --- a/arch/powerpc/platforms/pseries/Makefile +++ b/arch/powerpc/platforms/pseries/Makefile @@ -25,6 +25,7 @@ obj-$(CONFIG_LPARCFG) += lparcfg.o obj-$(CONFIG_IBMVIO) += vio.o obj-$(CONFIG_IBMEBUS) += ibmebus.o obj-$(CONFIG_PAPR_SCM) += papr_scm.o +obj-$(CONFIG_PPC_SVM) += svm.o ifdef CONFIG_PPC_PSERIES obj-$(CONFIG_SUSPEND) += suspend.o diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index e4f0dfd4ae33..c928e6e8a279 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -71,6 +71,7 @@ #include #include #include +#include #include "pseries.h" #include "../../../../drivers/pci/pci.h" @@ -329,8 +330,10 @@ static inline int alloc_dispatch_logs(void) static int alloc_dispatch_log_kmem_cache(void) { + void (*ctor)(void *) = get_dtl_cache_ctor(); + dtl_cache = kmem_cache_create("dtl", DISPATCH_LOG_BYTES, - DISPATCH_LOG_BYTES, 0, NULL); + DISPATCH_LOG_BYTES, 0, ctor); if (!dtl_cache) { pr_warn("Failed to create dispatch trace log buffer cache\n"); pr_warn("Stolen time statistics will be unreliable\n"); diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c new file mode 100644 index ..c508196f7c83 --- /dev/null +++ b/arch/powerpc/platforms/pseries/svm.c @@ -0,0 +1,40 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * Secure VM platform + * + * Copyright 2019 IBM Corporation + * Author: Anshuman Khandual + */ + +#include +#include + +/* There's one dispatch log per CPU. */ +#define NR_DTL_PAGE (DISPATCH_LOG_BYTES * CONFIG_NR_CPUS / PAGE_SIZE) + +static struct page *dtl_page_store[NR_DTL_PAGE]; +static long dtl_nr_pages; + +static bool is_dtl_page_shared(struct page *page) +{ + long i; + + for (i = 0; i < dtl_nr_pages; i++) + if (dtl_page_store[i] == page) + return true; + + return false; +} + +void dtl_cache_ctor(void *addr) +{ + unsigned long pfn = PHYS_PFN(__pa(addr)); + struct page *page = pfn_to_page(pfn); + + if (!is_dtl_page_shared(page)) { + dtl_page_store[dtl_nr_pages] = page; + dtl_nr_pages++; + WARN_ON(dtl_nr_pages >= NR_DTL_PAGE); + uv_share_page(pfn, 1); + } +}
RE: Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm
> -Original Message- > From: Vinod Koul > Sent: 2019年5月21日 12:18 > > On 07-05-19, 09:16, Robin Gong wrote: > > Because the number of ecspi1 rx event on i.mx8mm is 0, the condition > > check ignore such special case without dma channel enabled, which > > caused > > ecspi1 rx works failed. Actually, no need to check event_id0, checking > > event_id1 is enough for DEV_2_DEV case because it's so lucky that > > event_id1 never be 0. > > Well is that by chance or design that event_id1 will be never 0? > That's by chance. DEV_2_DEV is just for Audio case and non-zero for event_id1 on current i.MX family.
[PATCH 01/12] powerpc/pseries: Introduce option to build secure virtual machines
Introduce CONFIG_PPC_SVM to control support for secure guests and include Ultravisor-related helpers when it is selected Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/ultravisor.h | 2 +- arch/powerpc/kernel/Makefile | 4 +++- arch/powerpc/platforms/pseries/Kconfig | 12 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h index 4ffec7a36acd..09e0a615d96f 100644 --- a/arch/powerpc/include/asm/ultravisor.h +++ b/arch/powerpc/include/asm/ultravisor.h @@ -28,7 +28,7 @@ extern int early_init_dt_scan_ultravisor(unsigned long node, const char *uname, * This call supports up to 6 arguments and 4 return arguments. Use * UCALL_BUFSIZE to size the return argument buffer. */ -#if defined(CONFIG_PPC_UV) +#if defined(CONFIG_PPC_UV) || defined(CONFIG_PPC_SVM) long ucall(unsigned long opcode, unsigned long *retbuf, ...); #else static long ucall(unsigned long opcode, unsigned long *retbuf, ...) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index 43ff4546e469..1e9b721634c8 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -154,7 +154,9 @@ endif obj-$(CONFIG_EPAPR_PARAVIRT) += epapr_paravirt.o epapr_hcalls.o obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o -obj-$(CONFIG_PPC_UV) += ultravisor.o ucall.o +ifneq ($(CONFIG_PPC_UV)$(CONFIG_PPC_SVM),) +obj-y += ultravisor.o ucall.o +endif # Disable GCOV, KCOV & sanitizers in odd or sensitive code GCOV_PROFILE_prom_init.o := n diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 9c6b3d860518..82c16aa4f1ce 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -144,3 +144,15 @@ config PAPR_SCM tristate "Support for the PAPR Storage Class Memory interface" help Enable access to hypervisor provided storage class memory. + +config PPC_SVM + bool "Secure virtual machine (SVM) support for POWER" + depends on PPC_PSERIES + default n + help +Support secure guests on POWER. There are certain POWER platforms which +support secure guests using the Protected Execution Facility, with the +help of an Ultravisor executing below the hypervisor layer. This +enables the support for those guests. + +If unsure, say "N".
Re: [PATCH] dma: dw-axi-dmac: fix null dereference when pointer first is null
On 08-05-19, 23:33, Colin King wrote: > From: Colin Ian King > > In the unlikely event that axi_desc_get returns a null desc in the > very first iteration of the while-loop the error exit path ends > up calling axi_desc_put on a null pointer 'first' and this causes > a null pointer dereference. Fix this by adding a null check on > pointer 'first' before calling axi_desc_put. Applied, thanks -- ~Vinod
[RFC PATCH 02/12] powerpc: Add support for adding an ESM blob to the zImage wrapper
From: Benjamin Herrenschmidt For secure VMs, the signing tool will create a ticket called the "ESM blob" for the Enter Secure Mode ultravisor call with the signatures of the kernel and initrd among other things. This adds support to the wrapper script for adding that blob via the "-e" option to the zImage.pseries. It also adds code to the zImage wrapper itself to retrieve and if necessary relocate the blob, and pass its address to Linux via the device-tree, to be later consumed by prom_init. Signed-off-by: Benjamin Herrenschmidt [ Minor adjustments to some comments. ] Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/boot/main.c | 41 ++ arch/powerpc/boot/ops.h| 2 ++ arch/powerpc/boot/wrapper | 24 +--- arch/powerpc/boot/zImage.lds.S | 8 +++ 4 files changed, 72 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/boot/main.c b/arch/powerpc/boot/main.c index 78aaf4ffd7ab..ca612efd3e81 100644 --- a/arch/powerpc/boot/main.c +++ b/arch/powerpc/boot/main.c @@ -150,6 +150,46 @@ static struct addr_range prep_initrd(struct addr_range vmlinux, void *chosen, return (struct addr_range){(void *)initrd_addr, initrd_size}; } +#ifdef __powerpc64__ +static void prep_esm_blob(struct addr_range vmlinux, void *chosen) +{ + unsigned long esm_blob_addr, esm_blob_size; + + /* Do we have an ESM (Enter Secure Mode) blob? */ + if (_esm_blob_end <= _esm_blob_start) + return; + + printf("Attached ESM blob at 0x%p-0x%p\n\r", + _esm_blob_start, _esm_blob_end); + esm_blob_addr = (unsigned long)_esm_blob_start; + esm_blob_size = _esm_blob_end - _esm_blob_start; + + /* +* If the ESM blob is too low it will be clobbered when the +* kernel relocates to its final location. In this case, +* allocate a safer place and move it. +*/ + if (esm_blob_addr < vmlinux.size) { + void *old_addr = (void *)esm_blob_addr; + + printf("Allocating 0x%lx bytes for esm_blob ...\n\r", + esm_blob_size); + esm_blob_addr = (unsigned long)malloc(esm_blob_size); + if (!esm_blob_addr) + fatal("Can't allocate memory for ESM blob !\n\r"); + printf("Relocating ESM blob 0x%lx <- 0x%p (0x%lx bytes)\n\r", + esm_blob_addr, old_addr, esm_blob_size); + memmove((void *)esm_blob_addr, old_addr, esm_blob_size); + } + + /* Tell the kernel ESM blob address via device tree. */ + setprop_val(chosen, "linux,esm-blob-start", (u32)(esm_blob_addr)); + setprop_val(chosen, "linux,esm-blob-end", (u32)(esm_blob_addr + esm_blob_size)); +} +#else +static inline void prep_esm_blob(struct addr_range vmlinux, void *chosen) { } +#endif + /* A buffer that may be edited by tools operating on a zImage binary so as to * edit the command line passed to vmlinux (by setting /chosen/bootargs). * The buffer is put in it's own section so that tools may locate it easier. @@ -218,6 +258,7 @@ void start(void) vmlinux = prep_kernel(); initrd = prep_initrd(vmlinux, chosen, loader_info.initrd_addr, loader_info.initrd_size); + prep_esm_blob(vmlinux, chosen); prep_cmdline(chosen); printf("Finalizing device tree..."); diff --git a/arch/powerpc/boot/ops.h b/arch/powerpc/boot/ops.h index cd043726ed88..e0606766480f 100644 --- a/arch/powerpc/boot/ops.h +++ b/arch/powerpc/boot/ops.h @@ -251,6 +251,8 @@ extern char _initrd_start[]; extern char _initrd_end[]; extern char _dtb_start[]; extern char _dtb_end[]; +extern char _esm_blob_start[]; +extern char _esm_blob_end[]; static inline __attribute__((const)) int __ilog2_u32(u32 n) diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper index f9141eaec6ff..36b2ad6cd5b7 100755 --- a/arch/powerpc/boot/wrapper +++ b/arch/powerpc/boot/wrapper @@ -14,6 +14,7 @@ # -i initrdspecify initrd file # -d devtree specify device-tree blob # -s tree.dts specify device-tree source file (needs dtc installed) +# -e esm_blob specify ESM blob for secure images # -c cache $kernel.strip.gz (use if present & newer, else make) # -C prefixspecify command prefix for cross-building tools # (strip, objcopy, ld) @@ -38,6 +39,7 @@ platform=of initrd= dtb= dts= +esm_blob= cacheit= binary= compression=.gz @@ -60,9 +62,9 @@ tmpdir=. usage() { echo 'Usage: wrapper [-o output] [-p platform] [-i initrd]' >&2 -echo ' [-d devtree] [-s tree.dts] [-c] [-C cross-prefix]' >&2 -echo ' [-D datadir] [-W workingdir] [-Z (gz|xz|none)]' >&2 -echo ' [--no-compression] [vmlinux]' >&2 +echo ' [-d devtree] [-s tree.dts] [-e esm_blob]' >&2 +echo ' [-c] [-C cross-prefix] [-D datadir] [-W workingdir]' >&2 +echo ' [-Z (gz|xz|none)] [--no-compression]
[PATCH 09/12] powerpc/pseries/svm: Disable doorbells in SVM guests
From: Sukadev Bhattiprolu Normally, the HV emulates some instructions like MSGSNDP, MSGCLRP from a KVM guest. To emulate the instructions, it must first read the instruction from the guest's memory and decode its parameters. However for a secure guest (aka SVM), the page containing the instruction is in secure memory and the HV cannot access directly. It would need the Ultravisor (UV) to facilitate accessing the instruction and parameters but the UV currently does not have the support for such accesses. Until the UV has such support, disable doorbells in SVMs. This might incur a performance hit but that is yet to be quantified. With this patch applied (needed only in SVMs not needed for HV) we are able to launch SVM guests with multi-core support. Eg: qemu -smp sockets=2,cores=2,threads=2. Fix suggested by Benjamin Herrenschmidt. Thanks to input from Paul Mackerras, Ram Pai and Michael Anderson. Signed-off-by: Sukadev Bhattiprolu Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/platforms/pseries/smp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 3df46123cce3..95a5c24a1544 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "pseries.h" #include "offline_states.h" @@ -225,7 +226,7 @@ static __init void pSeries_smp_probe_xics(void) { xics_smp_probe(); - if (cpu_has_feature(CPU_FTR_DBELL)) + if (cpu_has_feature(CPU_FTR_DBELL) && !is_secure_guest()) smp_ops->cause_ipi = smp_pseries_cause_ipi; else smp_ops->cause_ipi = icp_ops->cause_ipi;
Re: [PATCH v1] dmaengine: tegra-apb: Handle DMA_PREP_INTERRUPT flag properly
On 08-05-19, 10:24, Jon Hunter wrote: > > On 05/05/2019 19:12, Dmitry Osipenko wrote: > > The DMA_PREP_INTERRUPT flag means that descriptor's callback should be > > invoked upon transfer completion and that's it. For some reason driver > > completely disables the hardware interrupt handling, leaving channel in > > unusable state if transfer is issued with the flag being unset. Note > > that there are no occurrences in the relevant drivers that do not set > > the flag, hence this patch doesn't fix any actual bug and merely fixes > > potential problem. > > > > Signed-off-by: Dmitry Osipenko > > >From having a look at this, I am guessing that we have never really > tested the case where DMA_PREP_INTERRUPT flag is not set because as you > mentioned it does not look like this will work at all! That is a fair argument > > Is there are use-case you are looking at where you don't set the > DMA_PREP_INTERRUPT flag? > > If not I am wondering if we should even bother supporting this and warn > if it is not set. AFAICT it does not appear to be mandatory, but maybe > Vinod can comment more on this. This is supposed to be used in the cases where you submit a bunch of descriptors and selectively dont want an interrupt in few cases... Is this such a case? Thanks ~Vinod
[PATCH 11/12] powerpc/pseries/svm: Force SWIOTLB for secure guests
From: Anshuman Khandual SWIOTLB checks range of incoming CPU addresses to be bounced and sees if the device can access it through its DMA window without requiring bouncing. In such cases it just chooses to skip bouncing. But for cases like secure guests on powerpc platform all addresses need to be bounced into the shared pool of memory because the host cannot access it otherwise. Hence the need to do the bouncing is not related to device's DMA window and use of bounce buffers is forced by setting swiotlb_force. Also, connect the shared memory conversion functions into the ARCH_HAS_MEM_ENCRYPT hooks and call swiotlb_update_mem_attributes() to convert SWIOTLB's memory pool to shared memory. Signed-off-by: Anshuman Khandual [ Use ARCH_HAS_MEM_ENCRYPT hooks to share swiotlb memory pool. ] Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/mem_encrypt.h | 19 +++ arch/powerpc/platforms/pseries/Kconfig | 5 +++ arch/powerpc/platforms/pseries/svm.c | 45 ++ 3 files changed, 69 insertions(+) diff --git a/arch/powerpc/include/asm/mem_encrypt.h b/arch/powerpc/include/asm/mem_encrypt.h new file mode 100644 index ..45d5e4d0e6e0 --- /dev/null +++ b/arch/powerpc/include/asm/mem_encrypt.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * SVM helper functions + * + * Copyright 2019 IBM Corporation + */ + +#ifndef _ASM_POWERPC_MEM_ENCRYPT_H +#define _ASM_POWERPC_MEM_ENCRYPT_H + +#define sme_me_mask0ULL + +static inline bool sme_active(void) { return false; } +static inline bool sev_active(void) { return false; } + +int set_memory_encrypted(unsigned long addr, int numpages); +int set_memory_decrypted(unsigned long addr, int numpages); + +#endif /* _ASM_POWERPC_MEM_ENCRYPT_H */ diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig index 82c16aa4f1ce..41b10f3bc729 100644 --- a/arch/powerpc/platforms/pseries/Kconfig +++ b/arch/powerpc/platforms/pseries/Kconfig @@ -145,9 +145,14 @@ config PAPR_SCM help Enable access to hypervisor provided storage class memory. +config ARCH_HAS_MEM_ENCRYPT + def_bool n + config PPC_SVM bool "Secure virtual machine (SVM) support for POWER" depends on PPC_PSERIES + select SWIOTLB + select ARCH_HAS_MEM_ENCRYPT default n help Support secure guests on POWER. There are certain POWER platforms which diff --git a/arch/powerpc/platforms/pseries/svm.c b/arch/powerpc/platforms/pseries/svm.c index c508196f7c83..618622d636d5 100644 --- a/arch/powerpc/platforms/pseries/svm.c +++ b/arch/powerpc/platforms/pseries/svm.c @@ -7,8 +7,53 @@ */ #include +#include +#include +#include #include +static int __init init_svm(void) +{ + if (!is_secure_guest()) + return 0; + + /* Don't release the SWIOTLB buffer. */ + ppc_swiotlb_enable = 1; + + /* +* Since the guest memory is inaccessible to the host, devices always +* need to use the SWIOTLB buffer for DMA even if dma_capable() says +* otherwise. +*/ + swiotlb_force = SWIOTLB_FORCE; + + /* Share the SWIOTLB buffer with the host. */ + swiotlb_update_mem_attributes(); + + return 0; +} +machine_early_initcall(pseries, init_svm); + +int set_memory_encrypted(unsigned long addr, int numpages) +{ + if (!PAGE_ALIGNED(addr)) + return -EINVAL; + + uv_unshare_page(PHYS_PFN(__pa(addr)), numpages); + + return 0; +} + +int set_memory_decrypted(unsigned long addr, int numpages) +{ + if (!PAGE_ALIGNED(addr)) + return -EINVAL; + + uv_share_page(PHYS_PFN(__pa(addr)), numpages); + + return 0; +} + /* There's one dispatch log per CPU. */ #define NR_DTL_PAGE (DISPATCH_LOG_BYTES * CONFIG_NR_CPUS / PAGE_SIZE)
[PATCH 07/14] fs: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- fs/aio.c | 5 +++-- fs/coredump.c | 5 +++-- fs/exec.c | 19 +--- fs/io_uring.c | 5 +++-- fs/proc/base.c| 23 fs/proc/internal.h| 2 ++ fs/proc/task_mmu.c| 32 +++ fs/proc/task_nommu.c | 22 +++ fs/userfaultfd.c | 50 ++- include/linux/userfaultfd_k.h | 5 +++-- 10 files changed, 100 insertions(+), 68 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3490d1fa0e16..215d19dbbefa 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -461,6 +461,7 @@ static const struct address_space_operations aio_ctx_aops = { static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct aio_ring *ring; struct mm_struct *mm = current->mm; unsigned long size, unused; @@ -521,7 +522,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_size = nr_pages * PAGE_SIZE; pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size); - if (down_write_killable(>mmap_sem)) { + if (mm_write_lock_killable(mm, )) { ctx->mmap_size = 0; aio_free_ring(ctx); return -EINTR; @@ -530,7 +531,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events) ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, 0, , NULL); - up_write(>mmap_sem); + mm_write_unlock(mm, ); if (IS_ERR((void *)ctx->mmap_base)) { ctx->mmap_size = 0; aio_free_ring(ctx); diff --git a/fs/coredump.c b/fs/coredump.c index e42e17e55bfd..433713b63187 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -409,6 +409,7 @@ static int zap_threads(struct task_struct *tsk, struct mm_struct *mm, static int coredump_wait(int exit_code, struct core_state *core_state) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct task_struct *tsk = current; struct mm_struct *mm = tsk->mm; int core_waiters = -EBUSY; @@ -417,12 +418,12 @@ static int coredump_wait(int exit_code, struct core_state *core_state) core_state->dumper.task = tsk; core_state->dumper.next = NULL; - if (down_write_killable(>mmap_sem)) + if (mm_write_lock_killable(mm, )) return -EINTR; if (!mm->core_state) core_waiters = zap_threads(tsk, mm, core_state, exit_code); - up_write(>mmap_sem); + mm_write_unlock(mm, ); if (core_waiters > 0) { struct core_thread *ptr; diff --git a/fs/exec.c b/fs/exec.c index e96fd5328739..fbcb36bc4fd1 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -241,6 +241,7 @@ static void flush_arg_page(struct linux_binprm *bprm, unsigned long pos, static int __bprm_mm_init(struct linux_binprm *bprm) { + DEFINE_RANGE_LOCK_FULL(mmrange); int err; struct vm_area_struct *vma = NULL; struct mm_struct *mm = bprm->mm; @@ -250,7 +251,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm) return -ENOMEM; vma_set_anonymous(vma); - if (down_write_killable(>mmap_sem)) { + if (mm_write_lock_killable(mm, )) { err = -EINTR; goto err_free; } @@ -273,11 +274,11 @@ static int __bprm_mm_init(struct linux_binprm *bprm) mm->stack_vm = mm->total_vm = 1; arch_bprm_mm_init(mm, vma); - up_write(>mmap_sem); + mm_write_unlock(mm, ); bprm->p = vma->vm_end - sizeof(void *); return 0; err: - up_write(>mmap_sem); + mm_write_unlock(mm, ); err_free: bprm->vma = NULL; vm_area_free(vma); @@ -691,6 +692,7 @@ int setup_arg_pages(struct linux_binprm *bprm, unsigned long stack_top, int executable_stack) { + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long ret; unsigned long stack_shift; struct mm_struct *mm = current->mm; @@ -738,7 +740,7 @@ int setup_arg_pages(struct linux_binprm *bprm, bprm->loader -= stack_shift; bprm->exec -= stack_shift; - if (down_write_killable(>mmap_sem)) + if (mm_write_lock_killable(mm, )) return -EINTR; vm_flags = VM_STACK_FLAGS; @@ -795,7 +797,7 @@ int setup_arg_pages(struct linux_binprm *bprm, ret = -EFAULT; out_unlock: - up_write(>mmap_sem); + mm_write_unlock(mm, ); return ret; } EXPORT_SYMBOL(setup_arg_pages); @@ -1010,6 +1012,7 @@ static int
[PATCH 12/12] powerpc/configs: Enable secure guest support in pseries and ppc64 defconfigs
From: Ryan Grimm Enables running as a secure guest in platforms with an Ultravisor. Signed-off-by: Ryan Grimm Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/configs/ppc64_defconfig | 1 + arch/powerpc/configs/pseries_defconfig | 1 + 2 files changed, 2 insertions(+) diff --git a/arch/powerpc/configs/ppc64_defconfig b/arch/powerpc/configs/ppc64_defconfig index d7c381009636..725297438320 100644 --- a/arch/powerpc/configs/ppc64_defconfig +++ b/arch/powerpc/configs/ppc64_defconfig @@ -31,6 +31,7 @@ CONFIG_DTL=y CONFIG_SCANLOG=m CONFIG_PPC_SMLPAR=y CONFIG_IBMEBUS=y +CONFIG_PPC_SVM=y CONFIG_PPC_MAPLE=y CONFIG_PPC_PASEMI=y CONFIG_PPC_PASEMI_IOMMU=y diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig index 62e12f61a3b2..724a574fe4b2 100644 --- a/arch/powerpc/configs/pseries_defconfig +++ b/arch/powerpc/configs/pseries_defconfig @@ -42,6 +42,7 @@ CONFIG_DTL=y CONFIG_SCANLOG=m CONFIG_PPC_SMLPAR=y CONFIG_IBMEBUS=y +CONFIG_PPC_SVM=y # CONFIG_PPC_PMAC is not set CONFIG_RTAS_FLASH=m CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
[PATCH 13/14] drivers: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- drivers/android/binder_alloc.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 7 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 9 + drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 +++-- drivers/gpu/drm/i915/i915_gem.c | 5 +++-- drivers/gpu/drm/i915/i915_gem_userptr.c | 11 +++ drivers/gpu/drm/nouveau/nouveau_svm.c| 23 ++- drivers/gpu/drm/radeon/radeon_cs.c | 5 +++-- drivers/gpu/drm/radeon/radeon_gem.c | 8 +--- drivers/gpu/drm/radeon/radeon_mn.c | 7 --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 4 ++-- drivers/infiniband/core/umem.c | 7 --- drivers/infiniband/core/umem_odp.c | 12 +++- drivers/infiniband/core/uverbs_main.c| 5 +++-- drivers/infiniband/hw/mlx4/mr.c | 5 +++-- drivers/infiniband/hw/qib/qib_user_pages.c | 7 --- drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +++-- drivers/iommu/amd_iommu_v2.c | 4 ++-- drivers/iommu/intel-svm.c| 4 ++-- drivers/media/v4l2-core/videobuf-core.c | 5 +++-- drivers/media/v4l2-core/videobuf-dma-contig.c| 5 +++-- drivers/media/v4l2-core/videobuf-dma-sg.c| 5 +++-- drivers/misc/cxl/cxllib.c| 5 +++-- drivers/misc/cxl/fault.c | 5 +++-- drivers/misc/sgi-gru/grufault.c | 20 drivers/misc/sgi-gru/grufile.c | 5 +++-- drivers/misc/sgi-gru/grukservices.c | 4 +++- drivers/misc/sgi-gru/grumain.c | 6 -- drivers/misc/sgi-gru/grutables.h | 5 - drivers/oprofile/buffer_sync.c | 12 +++- drivers/staging/kpc2000/kpc_dma/fileops.c| 5 +++-- drivers/tee/optee/call.c | 5 +++-- drivers/vfio/vfio_iommu_type1.c | 9 + drivers/xen/gntdev.c | 5 +++-- drivers/xen/privcmd.c| 17 ++--- include/linux/hmm.h | 7 --- 37 files changed, 160 insertions(+), 109 deletions(-) diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index bb929eb87116..0b9cd9becd76 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -195,6 +195,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, struct vm_area_struct *vma = NULL; struct mm_struct *mm = NULL; bool need_mm = false; + DEFINE_RANGE_LOCK_FULL(mmrange); binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, "%d: %s pages %pK-%pK\n", alloc->pid, @@ -220,7 +221,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, mm = alloc->vma_vm_mm; if (mm) { - down_read(>mmap_sem); + mm_read_lock(mm, ); vma = alloc->vma; } @@ -279,7 +280,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, /* vm_insert_page does not seem to increment the refcount */ } if (mm) { - up_read(>mmap_sem); + mm_read_unlock(mm, ); mmput(mm); } return 0; @@ -310,7 +311,7 @@ static int binder_update_page_range(struct binder_alloc *alloc, int allocate, } err_no_vma: if (mm) { - up_read(>mmap_sem); + mm_read_unlock(mm, ); mmput(mm); } return vma ? -ENOMEM : -ESRCH; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 123eb0d7e2e9..28ddd42b27be 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1348,9 +1348,9 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu( * concurrently and the queues are actually stopped */ if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) { - down_write(>mm->mmap_sem); + mm_write_lock(current->mm, ); is_invalid_userptr = atomic_read(>invalid); - up_write(>mm->mmap_sem); + mm_write_unlock(current->mm, ); } mutex_lock(>lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index 58ed401c5996..d002df91c7b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -376,13 +376,14 @@
[PATCH 14/14] mm: convert mmap_sem to range mmap_lock
With mmrange now in place and everyone using the mm locking wrappers, we can convert the rwsem to a the range locking scheme. Every single user of mmap_sem will use a full range, which means that there is no more parallelism than what we already had. This is the worst case scenario. Prefetching and some lockdep stuff have been blindly converted (for now). This lays out the foundations for later mm address space locking scalability. Signed-off-by: Davidlohr Bueso --- arch/x86/events/core.c | 2 +- arch/x86/kernel/tboot.c| 2 +- arch/x86/mm/fault.c| 2 +- drivers/firmware/efi/efi.c | 2 +- include/linux/mm.h | 26 +- include/linux/mm_types.h | 4 ++-- kernel/bpf/stackmap.c | 9 + kernel/fork.c | 2 +- mm/init-mm.c | 2 +- mm/memory.c| 2 +- 10 files changed, 27 insertions(+), 26 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index f315425d8468..45ecca077255 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2179,7 +2179,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm) * For now, this can't happen because all callers hold mmap_sem * for write. If this changes, we'll need a different solution. */ - lockdep_assert_held_exclusive(>mmap_sem); + lockdep_assert_held_exclusive(>mmap_lock); if (atomic_inc_return(>context.perf_rdpmc_allowed) == 1) on_each_cpu_mask(mm_cpumask(mm), refresh_pce, NULL, 1); diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 6e5ef8fb8a02..e5423e2451d3 100644 --- a/arch/x86/kernel/tboot.c +++ b/arch/x86/kernel/tboot.c @@ -104,7 +104,7 @@ static struct mm_struct tboot_mm = { .pgd= swapper_pg_dir, .mm_users = ATOMIC_INIT(2), .mm_count = ATOMIC_INIT(1), - .mmap_sem = __RWSEM_INITIALIZER(init_mm.mmap_sem), + .mmap_lock = __RANGE_LOCK_TREE_INITIALIZER(init_mm.mmap_lock), .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), }; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index fbb060c89e7d..9f285ba76f1e 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1516,7 +1516,7 @@ static noinline void __do_page_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { - prefetchw(>mm->mmap_sem); + prefetchw(>mm->mmap_lock); if (unlikely(kmmio_fault(regs, address))) return; diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 55b77c576c42..01e4937f3cea 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -80,7 +80,7 @@ struct mm_struct efi_mm = { .mm_rb = RB_ROOT, .mm_users = ATOMIC_INIT(2), .mm_count = ATOMIC_INIT(1), - .mmap_sem = __RWSEM_INITIALIZER(efi_mm.mmap_sem), + .mmap_lock = __RANGE_LOCK_TREE_INITIALIZER(efi_mm.mmap_lock), .page_table_lock= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock), .mmlist = LIST_HEAD_INIT(efi_mm.mmlist), .cpu_bitmap = { [BITS_TO_LONGS(NR_CPUS)] = 0}, diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bf3e2542047..5ac33c46679f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2899,74 +2899,74 @@ static inline void setup_nr_node_ids(void) {} static inline bool mm_is_locked(struct mm_struct *mm, struct range_lock *mmrange) { - return rwsem_is_locked(>mmap_sem); + return range_is_locked(>mmap_lock, mmrange); } /* Reader wrappers */ static inline int mm_read_trylock(struct mm_struct *mm, struct range_lock *mmrange) { - return down_read_trylock(>mmap_sem); + return range_read_trylock(>mmap_lock, mmrange); } static inline void mm_read_lock(struct mm_struct *mm, struct range_lock *mmrange) { - down_read(>mmap_sem); + range_read_lock(>mmap_lock, mmrange); } static inline void mm_read_lock_nested(struct mm_struct *mm, struct range_lock *mmrange, int subclass) { - down_read_nested(>mmap_sem, subclass); + range_read_lock_nested(>mmap_lock, mmrange, subclass); } static inline void mm_read_unlock(struct mm_struct *mm, struct range_lock *mmrange) { - up_read(>mmap_sem); + range_read_unlock(>mmap_lock, mmrange); } /* Writer wrappers */ static inline int mm_write_trylock(struct mm_struct *mm, struct range_lock *mmrange) { - return down_write_trylock(>mmap_sem); + return range_write_trylock(>mmap_lock, mmrange); } static
[PATCH 06/14] mm: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time, and we already have vmf updated. No changes in semantics. Signed-off-by: Davidlohr Bueso --- include/linux/mm.h | 8 +++--- mm/filemap.c | 8 +++--- mm/frame_vector.c | 4 +-- mm/gup.c | 21 +++ mm/hmm.c | 3 ++- mm/khugepaged.c| 54 +-- mm/ksm.c | 42 +- mm/madvise.c | 36 ++ mm/memcontrol.c| 10 +--- mm/memory.c| 10 +--- mm/mempolicy.c | 25 ++ mm/migrate.c | 10 +--- mm/mincore.c | 6 +++-- mm/mlock.c | 20 +-- mm/mmap.c | 69 -- mm/mmu_notifier.c | 9 --- mm/mprotect.c | 15 ++- mm/mremap.c| 9 --- mm/msync.c | 9 --- mm/nommu.c | 25 ++ mm/oom_kill.c | 5 ++-- mm/process_vm_access.c | 4 +-- mm/shmem.c | 2 +- mm/swapfile.c | 5 ++-- mm/userfaultfd.c | 21 --- mm/util.c | 10 +--- 26 files changed, 252 insertions(+), 188 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 044e428b1905..8bf3e2542047 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1459,6 +1459,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma, * right now." 1 means "skip the current vma." * @mm:mm_struct representing the target process of page table walk * @vma: vma currently walked (NULL if walking outside vmas) + * @mmrange: mm address space range locking * @private: private data for callbacks' usage * * (see the comment on walk_page_range() for more details) @@ -2358,8 +2359,8 @@ static inline int check_data_rlimit(unsigned long rlim, return 0; } -extern int mm_take_all_locks(struct mm_struct *mm); -extern void mm_drop_all_locks(struct mm_struct *mm); +extern int mm_take_all_locks(struct mm_struct *mm, struct range_lock *mmrange); +extern void mm_drop_all_locks(struct mm_struct *mm, struct range_lock *mmrange); extern void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file); extern struct file *get_mm_exe_file(struct mm_struct *mm); @@ -2389,7 +2390,8 @@ extern unsigned long do_mmap(struct file *file, unsigned long addr, vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); extern int __do_munmap(struct mm_struct *, unsigned long, size_t, - struct list_head *uf, bool downgrade); + struct list_head *uf, bool downgrade, + struct range_lock *); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); diff --git a/mm/filemap.c b/mm/filemap.c index 959022841bab..71f0d8a18f40 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1388,7 +1388,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm, if (flags & FAULT_FLAG_RETRY_NOWAIT) return 0; - up_read(>mmap_sem); + mm_read_unlock(mm, mmrange); if (flags & FAULT_FLAG_KILLABLE) wait_on_page_locked_killable(page); else @@ -1400,7 +1400,7 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm, ret = __lock_page_killable(page); if (ret) { - up_read(>mmap_sem); + mm_read_unlock(mm, mmrange); return 0; } } else @@ -2317,7 +2317,7 @@ static struct file *maybe_unlock_mmap_for_io(struct vm_fault *vmf, if ((flags & (FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_RETRY_NOWAIT)) == FAULT_FLAG_ALLOW_RETRY) { fpin = get_file(vmf->vma->vm_file); - up_read(>vma->vm_mm->mmap_sem); + mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange); } return fpin; } @@ -2357,7 +2357,7 @@ static int lock_page_maybe_drop_mmap(struct vm_fault *vmf, struct page *page, * mmap_sem here and return 0 if we don't have a fpin. */ if (*fpin == NULL) - up_read(>vma->vm_mm->mmap_sem); + mm_read_unlock(vmf->vma->vm_mm, vmf->lockrange); return 0; } } else diff --git a/mm/frame_vector.c b/mm/frame_vector.c index 4e1a577cbb79..ef33d21b3f39 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -47,7 +47,7 @@ int get_vaddr_frames(unsigned long start, unsigned int
[PATCH 02/14] Introduce range reader/writer lock
This implements a sleepable range rwlock, based on interval tree, serializing conflicting/intersecting/overlapping ranges within the tree. The largest range is given by [0, ~0] (inclusive). Unlike traditional locks, range locking involves dealing with the tree itself and the range to be locked, normally stack allocated and always explicitly prepared/initialized by the user in a [a0, a1] a0 <= a1 sorted manner, before actually taking the lock. Interval-tree based range locking is about controlling tasks' forward progress when adding an arbitrary interval (node) to the tree, depending on any overlapping ranges. A task can only continue (wakeup) if there are no intersecting ranges, thus achieving mutual exclusion. To this end, a reference counter is kept for each intersecting range in the tree (_before_ adding itself to it). To enable shared locking semantics, the reader to-be-locked will not take reference if an intersecting node is also a reader, therefore ignoring the node altogether. Fairness and freedom of starvation are guaranteed by the lack of lock stealing, thus range locks depend directly on interval tree semantics. This is particularly for iterations, where the key for the rbtree is given by the interval's low endpoint, and duplicates are walked as it would an inorder traversal of the tree. The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all is total number of ranges and R_int is the number of ranges intersecting the operated range. How much does it cost: -- The cost of lock and unlock of a range is O((1+R_int)log(R_all)) where R_all is total number of ranges and R_int is the number of ranges intersecting the new range range to be added. Due to its sharable nature, full range locks can be compared with rw-sempahores, which also serves from a mutex standpoint as writer-only situations are pretty similar nowadays. The first is the memory footprint, tree locks are smaller than rwsems: 32 vs 40 bytes, but require an additional 72 bytes of stack for the range structure. Secondly, because every range call is serialized by the tree->lock, any lock() fastpath will at least have an interval_tree_insert() and spinlock lock+unlock overhead compared to a single atomic insn in the case of rwsems. Similar scenario obviously for the unlock() case. The torture module was used to measure 1-1 differences in lock acquisition with increasing core counts over a period of 10 minutes. Readers and writers are interleaved, with a slight advantage to writers as its the first kthread that is created. The following shows the avg ops/minute with various thread-setups on boxes with small and large core-counts. ** 4-core AMD Opteron ** (write-only) rwsem-2thr: 4198.5, stddev: 7.77 range-2thr: 4199.1, stddev: 0.73 rwsem-4thr: 6036.8, stddev: 50.91 range-4thr: 6004.9, stddev: 126.57 rwsem-8thr: 6245.6, stddev: 59.39 range-8thr: 6229.3, stddev: 10.60 (read-only) rwsem-2thr: 5930.7, stddev: 21.92 range-2thr: 5917.3, stddev: 25.45 rwsem-4thr: 9881.6, stddev: 0.70 range-4thr: 9540.2, stddev: 98.28 rwsem-8thr: 11633.2, stddev: 7.72 range-8thr: 11314.7, stddev: 62.22 For the read/write-only cases, there is very little difference between the range lock and rwsems, with up to a 3% hit, which could very well be considered in the noise range. (read-write) rwsem-write-1thr: 1744.8, stddev: 11.59 rwsem-read-1thr: 1043.1, stddev: 3.97 range-write-1thr: 1740.2, stddev: 5.99 range-read-1thr: 1022.5, stddev: 6.41 rwsem-write-2thr: 1662.5, stddev: 0.70 rwsem-read-2thr: 1278.0, stddev: 25.45 range-write-2thr: 1321.5, stddev: 51.61 range-read-2thr: 1243.5, stddev: 30.40 rwsem-write-4thr: 1761.0, stddev: 11.31 rwsem-read-4thr: 1426.0, stddev: 7.07 range-write-4thr: 1417.0, stddev: 29.69 range-read-4thr: 1398.0, stddev: 56.56 While a single reader and writer threads does not show must difference, increasing core counts shows that in reader/writer workloads, writer threads can take a hit in raw performance of up to ~20%, while the number of reader throughput is quite similar among both locks. ** 240-core (ht) IvyBridge ** (write-only) rwsem-120thr: 6844.5, stddev: 82.73 range-120thr: 6070.5, stddev: 85.55 rwsem-240thr: 6292.5, stddev: 146.3 range-240thr: 6099.0, stddev: 15.55 rwsem-480thr: 6164.8, stddev: 33.94 range-480thr: 6062.3, stddev: 19.79 (read-only) rwsem-120thr: 136860.4, stddev: 2539.92 range-120thr: 138052.2, stddev: 327.39 rwsem-240thr: 235297.5, stddev: 2220.50 range-240thr: 232099.1, stddev: 3614.72 rwsem-480thr: 272683.0, stddev: 3924.32 range-480thr: 256539.2, stddev: 9541.69 Similar to the small box, larger machines show that range locks take only a minor (up to ~6% for 480 threads) hit even in completely exclusive or shared scenarios. (read-write) rwsem-write-60thr: 4658.1, stddev: 1303.19 rwsem-read-60thr: 1108.7, stddev: 718.42 range-write-60thr: 3203.6, stddev: 139.30 range-read-60thr: 1852.8, stddev: 147.5 rwsem-write-120thr: 3971.3,
[PATCH 12/14] kernel: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- kernel/acct.c | 5 +++-- kernel/bpf/stackmap.c | 7 +-- kernel/events/core.c| 5 +++-- kernel/events/uprobes.c | 20 kernel/exit.c | 9 + kernel/fork.c | 16 ++-- kernel/futex.c | 5 +++-- kernel/sched/fair.c | 5 +++-- kernel/sys.c| 22 +- kernel/trace/trace_output.c | 5 +++-- 10 files changed, 60 insertions(+), 39 deletions(-) diff --git a/kernel/acct.c b/kernel/acct.c index 81f9831a7859..2bbcecbd78ef 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -538,14 +538,15 @@ void acct_collect(long exitcode, int group_dead) if (group_dead && current->mm) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); vma = current->mm->mmap; while (vma) { vsize += vma->vm_end - vma->vm_start; vma = vma->vm_next; } - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); } spin_lock_irq(>sighand->siglock); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 950ab2f28922..fdb352bea7e8 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -37,6 +37,7 @@ struct bpf_stack_map { struct stack_map_irq_work { struct irq_work irq_work; struct rw_semaphore *sem; + struct range_lock *mmrange; }; static void do_up_read(struct irq_work *entry) @@ -291,6 +292,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, struct vm_area_struct *vma; bool irq_work_busy = false; struct stack_map_irq_work *work = NULL; + DEFINE_RANGE_LOCK_FULL(mmrange); if (in_nmi()) { work = this_cpu_ptr(_read_work); @@ -309,7 +311,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, * with build_id. */ if (!user || !current || !current->mm || irq_work_busy || - down_read_trylock(>mm->mmap_sem) == 0) { + mm_read_trylock(current->mm, ) == 0) { /* cannot access current->mm, fall back to ips */ for (i = 0; i < trace_nr; i++) { id_offs[i].status = BPF_STACK_BUILD_ID_IP; @@ -334,9 +336,10 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, } if (!work) { - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); } else { work->sem = >mm->mmap_sem; + work->mmrange = irq_work_queue(>irq_work); /* * The irq_work will release the mmap_sem with diff --git a/kernel/events/core.c b/kernel/events/core.c index abbd4b3b96c2..3b43cfe63b54 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9079,6 +9079,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event) struct mm_struct *mm = NULL; unsigned int count = 0; unsigned long flags; + DEFINE_RANGE_LOCK_FULL(mmrange); /* * We may observe TASK_TOMBSTONE, which means that the event tear-down @@ -9092,7 +9093,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event) if (!mm) goto restart; - down_read(>mmap_sem); + mm_read_lock(mm, ); } raw_spin_lock_irqsave(>lock, flags); @@ -9118,7 +9119,7 @@ static void perf_event_addr_filters_apply(struct perf_event *event) raw_spin_unlock_irqrestore(>lock, flags); if (ifh->nr_file_filters) { - up_read(>mmap_sem); + mm_read_unlock(mm, ); mmput(mm); } diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 3689eceb8d0c..6779c237799a 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -997,6 +997,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) bool is_register = !!new; struct map_info *info; int err = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); percpu_down_write(_mmap_sem); info = build_map_info(uprobe->inode->i_mapping, @@ -1013,7 +1014,7 @@ register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) if (err && is_register) goto free; - down_write(>mmap_sem); + mm_write_lock(mm, ); vma = find_vma(mm, info->vaddr); if (!vma || !valid_vma(vma, is_register) ||
[PATCH 11/14] ipc: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- ipc/shm.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/ipc/shm.c b/ipc/shm.c index ce1ca9f7c6e9..3666fa71bfc2 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -1418,6 +1418,7 @@ COMPAT_SYSCALL_DEFINE3(old_shmctl, int, shmid, int, cmd, void __user *, uptr) long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr, unsigned long shmlba) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct shmid_kernel *shp; unsigned long addr = (unsigned long)shmaddr; unsigned long size; @@ -1544,7 +1545,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, if (err) goto out_fput; - if (down_write_killable(>mm->mmap_sem)) { + if (mm_write_lock_killable(current->mm, )) { err = -EINTR; goto out_fput; } @@ -1564,7 +1565,7 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, if (IS_ERR_VALUE(addr)) err = (long)addr; invalid: - up_write(>mm->mmap_sem); + mm_write_unlock(current->mm, ); if (populate) mm_populate(addr, populate); @@ -1625,6 +1626,7 @@ COMPAT_SYSCALL_DEFINE3(shmat, int, shmid, compat_uptr_t, shmaddr, int, shmflg) */ long ksys_shmdt(char __user *shmaddr) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long addr = (unsigned long)shmaddr; @@ -1638,7 +1640,7 @@ long ksys_shmdt(char __user *shmaddr) if (addr & ~PAGE_MASK) return retval; - if (down_write_killable(>mmap_sem)) + if (mm_write_lock_killable(mm, )) return -EINTR; /* @@ -1726,7 +1728,7 @@ long ksys_shmdt(char __user *shmaddr) #endif - up_write(>mmap_sem); + mm_write_unlock(mm, ); return retval; } -- 2.16.4
[PATCH 08/14] arch/x86: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- arch/x86/entry/vdso/vma.c | 12 +++- arch/x86/kernel/vm86_32.c | 5 +++-- arch/x86/kvm/paging_tmpl.h | 9 + arch/x86/mm/debug_pagetables.c | 8 arch/x86/mm/fault.c| 8 arch/x86/mm/mpx.c | 15 +-- arch/x86/um/vdso/vma.c | 5 +++-- 7 files changed, 35 insertions(+), 27 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index babc4e7a519c..f6d8950f37b8 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -145,12 +145,13 @@ static const struct vm_special_mapping vvar_mapping = { */ static int map_vdso(const struct vdso_image *image, unsigned long addr) { + DEFINE_RANGE_LOCK_FULL(mmrange); struct mm_struct *mm = current->mm; struct vm_area_struct *vma; unsigned long text_start; int ret = 0; - if (down_write_killable(>mmap_sem)) + if (mm_write_lock_killable(mm, )) return -EINTR; addr = get_unmapped_area(NULL, addr, @@ -193,7 +194,7 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr) } up_fail: - up_write(>mmap_sem); + mm_write_unlock(mm, ); return ret; } @@ -254,8 +255,9 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(>mmap_sem); + mm_write_lock(mm, ); /* * Check if we have already mapped vdso blob - fail to prevent * abusing from userspace install_speciall_mapping, which may @@ -266,11 +268,11 @@ int map_vdso_once(const struct vdso_image *image, unsigned long addr) for (vma = mm->mmap; vma; vma = vma->vm_next) { if (vma_is_special_mapping(vma, _mapping) || vma_is_special_mapping(vma, _mapping)) { - up_write(>mmap_sem); + mm_write_unlock(mm, ); return -EEXIST; } } - up_write(>mmap_sem); + mm_write_unlock(mm, ); return map_vdso(image, addr); } diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index 6a38717d179c..39eecee07dcd 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -171,8 +171,9 @@ static void mark_screen_rdonly(struct mm_struct *mm) pmd_t *pmd; pte_t *pte; int i; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_write(>mmap_sem); + mm_write_lock(mm, ); pgd = pgd_offset(mm, 0xA); if (pgd_none_or_clear_bad(pgd)) goto out; @@ -198,7 +199,7 @@ static void mark_screen_rdonly(struct mm_struct *mm) } pte_unmap_unlock(pte, ptl); out: - up_write(>mmap_sem); + mm_write_unlock(mm, ); flush_tlb_mm_range(mm, 0xA, 0xA + 32*PAGE_SIZE, PAGE_SHIFT, false); } diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 367a47df4ba0..347d3ba41974 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -152,23 +152,24 @@ static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK; unsigned long pfn; unsigned long paddr; + DEFINE_RANGE_LOCK_FULL(mmrange); - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); vma = find_vma_intersection(current->mm, vaddr, vaddr + PAGE_SIZE); if (!vma || !(vma->vm_flags & VM_PFNMAP)) { - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); return -EFAULT; } pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; paddr = pfn << PAGE_SHIFT; table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); if (!table) { - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); return -EFAULT; } ret = CMPXCHG([index], orig_pte, new_pte); memunmap(table); - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); } return (ret != orig_pte); diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c index cd84f067e41d..0d131edc6a75 100644 --- a/arch/x86/mm/debug_pagetables.c +++ b/arch/x86/mm/debug_pagetables.c @@ -15,9 +15,9 @@ DEFINE_SHOW_ATTRIBUTE(ptdump); static int ptdump_curknl_show(struct seq_file *m, void *v) { if (current->mm->pgd) { -
[PATCH 09/14] virt: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- virt/kvm/arm/mmu.c | 17 ++--- virt/kvm/async_pf.c | 4 ++-- virt/kvm/kvm_main.c | 11 ++- 3 files changed, 18 insertions(+), 14 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 74b6582eaa3c..85f8b9ccfabe 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -980,9 +980,10 @@ void stage2_unmap_vm(struct kvm *kvm) struct kvm_memslots *slots; struct kvm_memory_slot *memslot; int idx; + DEFINE_RANGE_LOCK_FULL(mmrange); idx = srcu_read_lock(>srcu); - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); spin_lock(>mmu_lock); slots = kvm_memslots(kvm); @@ -990,7 +991,7 @@ void stage2_unmap_vm(struct kvm *kvm) stage2_unmap_memslot(kvm, memslot); spin_unlock(>mmu_lock); - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); srcu_read_unlock(>srcu, idx); } @@ -1688,6 +1689,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); @@ -1700,11 +1702,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } /* Let's check if we will get back a huge page backed by hugetlbfs */ - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); vma = find_vma_intersection(current->mm, hva, hva + 1); if (unlikely(!vma)) { kvm_err("Failed to find VMA for hva 0x%lx\n", hva); - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); return -EFAULT; } @@ -1725,7 +1727,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (vma_pagesize == PMD_SIZE || (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); /* We need minimum second+third level pages */ ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm), @@ -2280,6 +2282,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, hva_t reg_end = hva + mem->memory_size; bool writable = !(mem->flags & KVM_MEM_READONLY); int ret = 0; + DEFINE_RANGE_LOCK_FULL(mmrange); if (change != KVM_MR_CREATE && change != KVM_MR_MOVE && change != KVM_MR_FLAGS_ONLY) @@ -2293,7 +2296,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, (kvm_phys_size(kvm) >> PAGE_SHIFT)) return -EFAULT; - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); /* * A memory region could potentially cover multiple VMAs, and any holes * between them, so iterate over all of them to find out if we can map @@ -2361,7 +2364,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, stage2_flush_memslot(kvm, memslot); spin_unlock(>mmu_lock); out: - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); return ret; } diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c index e93cd8515134..03d9f9bc5270 100644 --- a/virt/kvm/async_pf.c +++ b/virt/kvm/async_pf.c @@ -87,11 +87,11 @@ static void async_pf_execute(struct work_struct *work) * mm and might be done in another context, so we must * access remotely. */ - down_read(>mmap_sem); + mm_read_lock(mm, ); get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL, , ); if (locked) - up_read(>mmap_sem); + mm_read_unlock(mm, ); kvm_async_page_present_sync(vcpu, apf); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e1484150a3dd..421652e66a03 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1331,6 +1331,7 @@ EXPORT_SYMBOL_GPL(kvm_is_visible_gfn); unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) { struct vm_area_struct *vma; + DEFINE_RANGE_LOCK_FULL(mmrange); unsigned long addr, size; size = PAGE_SIZE; @@ -1339,7 +1340,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) if (kvm_is_error_hva(addr)) return PAGE_SIZE; - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); vma = find_vma(current->mm, addr); if (!vma) goto out; @@ -1347,7 +1348,7 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn) size =
[PATCH 10/14] net: teach the mm about range locking
Conversion is straightforward, mmap_sem is used within the the same function context most of the time. No change in semantics. Signed-off-by: Davidlohr Bueso --- net/ipv4/tcp.c | 5 +++-- net/xdp/xdp_umem.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 53d61ca3ac4b..2be929dcafa8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1731,6 +1731,7 @@ static int tcp_zerocopy_receive(struct sock *sk, struct tcp_sock *tp; int inq; int ret; + DEFINE_RANGE_LOCK_FULL(mmrange); if (address & (PAGE_SIZE - 1) || address != zc->address) return -EINVAL; @@ -1740,7 +1741,7 @@ static int tcp_zerocopy_receive(struct sock *sk, sock_rps_record_flow(sk); - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); ret = -EINVAL; vma = find_vma(current->mm, address); @@ -1802,7 +1803,7 @@ static int tcp_zerocopy_receive(struct sock *sk, frags++; } out: - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); if (length) { tp->copied_seq = seq; tcp_rcv_space_adjust(sk); diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c index 2b18223e7eb8..2bf444fb998d 100644 --- a/net/xdp/xdp_umem.c +++ b/net/xdp/xdp_umem.c @@ -246,16 +246,17 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem) unsigned int gup_flags = FOLL_WRITE; long npgs; int err; + DEFINE_RANGE_LOCK_FULL(mmrange); umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL | __GFP_NOWARN); if (!umem->pgs) return -ENOMEM; - down_read(>mm->mmap_sem); + mm_read_lock(current->mm, ); npgs = get_user_pages(umem->address, umem->npgs, gup_flags | FOLL_LONGTERM, >pgs[0], NULL); - up_read(>mm->mmap_sem); + mm_read_unlock(current->mm, ); if (npgs != umem->npgs) { if (npgs >= 0) { -- 2.16.4
[PATCH 10/12] powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests
Secure guest memory is inacessible to devices so regular DMA isn't possible. In that case set devices' dma_map_ops to NULL so that the generic DMA code path will use SWIOTLB and DMA to bounce buffers. Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/platforms/pseries/iommu.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index 03bbb299320e..7d9550edb700 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -50,6 +50,7 @@ #include #include #include +#include #include "pseries.h" @@ -1332,7 +1333,10 @@ void iommu_init_early_pSeries(void) of_reconfig_notifier_register(_reconfig_nb); register_memory_notifier(_mem_nb); - set_pci_dma_ops(_iommu_ops); + if (is_secure_guest()) + set_pci_dma_ops(NULL); + else + set_pci_dma_ops(_iommu_ops); } static int __init disable_multitce(char *str)
[PATCH 05/14] mm: remove some BUG checks wrt mmap_sem
This patch is a collection of hacks that shamelessly remove mmap_sem state checks in order to not have to teach file_operations about range locking; for thp and huge pagecache: By dropping the rwsem_is_locked checks in zap_pmd_range() and zap_pud_range() we can avoid having to teach file_operations about mmrange. For example in xfs: iomap_dio_rw() is called by .read_iter file callbacks. We also avoid mmap_sem trylock in vm_insert_page(): The rules to this function state that mmap_sem must be acquired by the caller: - for write if used in f_op->mmap() (by far the most common case) - for read if used from vma_op->fault()(with VM_MIXEDMAP) The only exception is: mmap_vmcore() remap_vmalloc_range_partial() mmap_vmcore() But there is no concurrency here, thus mmap_sem is not held. After auditing the kernel, the following drivers use the fault path and correctly set VM_MIXEDMAP): .fault = etnaviv_gem_fault .fault = udl_gem_fault tegra_bo_fault() As such, drop the reader trylock BUG_ON() for the common case. This avoids having file_operations know about mmranges, as mmap_sem is held during, mmap() for example. Signed-off-by: Davidlohr Bueso --- include/linux/huge_mm.h | 2 -- mm/memory.c | 2 -- mm/mmap.c | 4 ++-- mm/pagewalk.c | 3 --- 4 files changed, 2 insertions(+), 9 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7cd5c150c21d..a4a9cfa78d8f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -194,7 +194,6 @@ static inline int is_swap_pmd(pmd_t pmd) static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma) { - VM_BUG_ON_VMA(!rwsem_is_locked(>vm_mm->mmap_sem), vma); if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) return __pmd_trans_huge_lock(pmd, vma); else @@ -203,7 +202,6 @@ static inline spinlock_t *pmd_trans_huge_lock(pmd_t *pmd, static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, struct vm_area_struct *vma) { - VM_BUG_ON_VMA(!rwsem_is_locked(>vm_mm->mmap_sem), vma); if (pud_trans_huge(*pud) || pud_devmap(*pud)) return __pud_trans_huge_lock(pud, vma); else diff --git a/mm/memory.c b/mm/memory.c index 9516c95108a1..73971f859035 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1212,7 +1212,6 @@ static inline unsigned long zap_pud_range(struct mmu_gather *tlb, next = pud_addr_end(addr, end); if (pud_trans_huge(*pud) || pud_devmap(*pud)) { if (next - addr != HPAGE_PUD_SIZE) { - VM_BUG_ON_VMA(!rwsem_is_locked(>mm->mmap_sem), vma); split_huge_pud(vma, pud, addr); } else if (zap_huge_pud(tlb, vma, pud, addr)) goto next; @@ -1519,7 +1518,6 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, if (!page_count(page)) return -EINVAL; if (!(vma->vm_flags & VM_MIXEDMAP)) { - BUG_ON(down_read_trylock(>vm_mm->mmap_sem)); BUG_ON(vma->vm_flags & VM_PFNMAP); vma->vm_flags |= VM_MIXEDMAP; } diff --git a/mm/mmap.c b/mm/mmap.c index af228ae3508d..a03ded49f9eb 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3466,7 +3466,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma) * The LSB of head.next can't change from under us * because we hold the mm_all_locks_mutex. */ - down_write_nest_lock(_vma->root->rwsem, >mmap_sem); + down_write(>mmap_sem); /* * We can safely modify head.next after taking the * anon_vma->root->rwsem. If some other vma in this mm shares @@ -3496,7 +3496,7 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping) */ if (test_and_set_bit(AS_MM_ALL_LOCKS, >flags)) BUG(); - down_write_nest_lock(>i_mmap_rwsem, >mmap_sem); + down_write(>mmap_sem); } } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index c3084ff2569d..6246acf17054 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -303,8 +303,6 @@ int walk_page_range(unsigned long start, unsigned long end, if (!walk->mm) return -EINVAL; - VM_BUG_ON_MM(!rwsem_is_locked(>mm->mmap_sem), walk->mm); - vma = find_vma(walk->mm, start); do { if (!vma) { /* after the last vma */ @@ -346,7 +344,6 @@ int walk_page_vma(struct vm_area_struct *vma, struct mm_walk *walk) if (!walk->mm) return -EINVAL; - VM_BUG_ON(!rwsem_is_locked(>mm->mmap_sem)); VM_BUG_ON(!vma); walk->vma = vma; err = walk_page_test(vma->vm_start, vma->vm_end, walk); -- 2.16.4
[PATCH 03/14] mm: introduce mm locking wrappers
This patch adds the necessary wrappers to encapsulate mmap_sem locking and will enable any future changes to be a lot more confined to here. In addition, future users will incrementally be added in the next patches. mm_[read/write]_[un]lock() naming is used. Signed-off-by: Davidlohr Bueso --- include/linux/mm.h | 76 ++ 1 file changed, 76 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e8834ac32b7..780b6097ee47 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -2880,5 +2881,80 @@ void __init setup_nr_node_ids(void); static inline void setup_nr_node_ids(void) {} #endif +/* + * Address space locking wrappers. + */ +static inline bool mm_is_locked(struct mm_struct *mm, + struct range_lock *mmrange) +{ + return rwsem_is_locked(>mmap_sem); +} + +/* Reader wrappers */ +static inline int mm_read_trylock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + return down_read_trylock(>mmap_sem); +} + +static inline void mm_read_lock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + down_read(>mmap_sem); +} + +static inline void mm_read_lock_nested(struct mm_struct *mm, + struct range_lock *mmrange, int subclass) +{ + down_read_nested(>mmap_sem, subclass); +} + +static inline void mm_read_unlock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + up_read(>mmap_sem); +} + +/* Writer wrappers */ +static inline int mm_write_trylock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + return down_write_trylock(>mmap_sem); +} + +static inline void mm_write_lock(struct mm_struct *mm, +struct range_lock *mmrange) +{ + down_write(>mmap_sem); +} + +static inline int mm_write_lock_killable(struct mm_struct *mm, +struct range_lock *mmrange) +{ + return down_write_killable(>mmap_sem); +} + +static inline void mm_downgrade_write(struct mm_struct *mm, + struct range_lock *mmrange) +{ + downgrade_write(>mmap_sem); +} + +static inline void mm_write_unlock(struct mm_struct *mm, + struct range_lock *mmrange) +{ + up_write(>mmap_sem); +} + +static inline void mm_write_lock_nested(struct mm_struct *mm, + struct range_lock *mmrange, + int subclass) +{ + down_write_nested(>mmap_sem, subclass); +} + +#define mm_write_nest_lock(mm, range, nest_lock) \ + down_write_nest_lock(&(mm)->mmap_sem, nest_lock) + #endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ -- 2.16.4
[PATCH 04/14] mm: teach pagefault paths about range locking
When handling a page fault, it happens that the mmap_sem is released during the processing. As moving to range lock requires remembering the range parameter to do the lock/unlock, this patch adds a pointer to struct vm_fault. As such, we work outwards from arming the vmf from: handle_mm_fault(), __collapse_huge_page_swapin() and hugetlb_no_page() The idea is to use a local, stack allocated variable (no concurrency) whenever the mmap_sem is originally taken and we end up in pf paths that end up retaking the lock. Ie: DEFINE_RANGE_LOCK_FULL(mmrange); down_write(>mmap_sem); some_fn(a, b, c, ); ... handle_mm_fault(vma, addr, flags, mmrange); ... up_write(>mmap_sem); Consequentially we also end up updating lock_page_or_retry(), which can drop the mmap_sem. For the the gup family, we pass nil for scenarios when the semaphore will remain untouched. Semantically nothing changes at all, and the 'mmrange' ends up being unused for now. Later patches will use the variable when the mmap_sem wrappers replace straightforward down/up. *** For simplicity, this patch breaks when used in ksm and hmm. *** Signed-off-by: Davidlohr Bueso --- arch/x86/mm/fault.c | 27 -- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/i915/i915_gem_userptr.c | 2 +- drivers/infiniband/core/umem_odp.c | 2 +- drivers/iommu/amd_iommu_v2.c| 3 +- drivers/iommu/intel-svm.c | 3 +- drivers/vfio/vfio_iommu_type1.c | 2 +- fs/exec.c | 2 +- include/linux/hugetlb.h | 9 +++-- include/linux/mm.h | 24 include/linux/pagemap.h | 6 +-- kernel/events/uprobes.c | 7 ++-- kernel/futex.c | 2 +- mm/filemap.c| 2 +- mm/frame_vector.c | 6 ++- mm/gup.c| 65 - mm/hmm.c| 4 +- mm/hugetlb.c| 14 --- mm/internal.h | 3 +- mm/khugepaged.c | 24 +++- mm/ksm.c| 3 +- mm/memory.c | 14 --- mm/mempolicy.c | 9 +++-- mm/mmap.c | 4 +- mm/mprotect.c | 2 +- mm/process_vm_access.c | 4 +- security/tomoyo/domain.c| 2 +- virt/kvm/async_pf.c | 3 +- virt/kvm/kvm_main.c | 9 +++-- 29 files changed, 159 insertions(+), 100 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 46df4c6aae46..fb869c292b91 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -938,7 +938,8 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, static void __bad_area(struct pt_regs *regs, unsigned long error_code, - unsigned long address, u32 pkey, int si_code) + unsigned long address, u32 pkey, int si_code, + struct range_lock *mmrange) { struct mm_struct *mm = current->mm; /* @@ -951,9 +952,10 @@ __bad_area(struct pt_regs *regs, unsigned long error_code, } static noinline void -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address) +bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address, +struct range_lock *mmrange) { - __bad_area(regs, error_code, address, 0, SEGV_MAPERR); + __bad_area(regs, error_code, address, 0, SEGV_MAPERR, mmrange); } static inline bool bad_area_access_from_pkeys(unsigned long error_code, @@ -975,7 +977,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code, static noinline void bad_area_access_error(struct pt_regs *regs, unsigned long error_code, - unsigned long address, struct vm_area_struct *vma) + unsigned long address, struct vm_area_struct *vma, + struct range_lock *mmrange) { /* * This OSPKE check is not strictly necessary at runtime. @@ -1005,9 +1008,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code, */ u32 pkey = vma_pkey(vma); - __bad_area(regs, error_code, address, pkey, SEGV_PKUERR); + __bad_area(regs, error_code, address, pkey, SEGV_PKUERR, mmrange); } else { - __bad_area(regs, error_code, address, 0, SEGV_ACCERR); + __bad_area(regs, error_code, address, 0, SEGV_ACCERR, mmrange); } } @@ -1306,6 +1309,7 @@ void do_user_addr_fault(struct pt_regs *regs, struct mm_struct *mm; vm_fault_t fault, major = 0; unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; +
[PATCH 08/12] powerpc/pseries/svm: Export guest SVM status to user space via sysfs
From: Ryan Grimm User space might want to know it's running in a secure VM. It can't do a mfmsr because mfmsr is a privileged instruction. The solution here is to create a cpu attribute: /sys/devices/system/cpu/svm which will read 0 or 1 based on the S bit of the guest's CPU 0. Signed-off-by: Ryan Grimm Reviewed-by: Ram Pai Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/kernel/sysfs.c | 29 + 1 file changed, 29 insertions(+) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index e8e93c2c7d03..8fdab134e9ae 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -18,6 +18,7 @@ #include #include #include +#include #include "cacheinfo.h" #include "setup.h" @@ -714,6 +715,32 @@ static struct device_attribute pa6t_attrs[] = { #endif /* HAS_PPC_PMC_PA6T */ #endif /* HAS_PPC_PMC_CLASSIC */ +#ifdef CONFIG_PPC_SVM +static void get_svm(void *val) +{ + u32 *value = val; + + *value = is_secure_guest(); +} + +static ssize_t show_svm(struct device *dev, struct device_attribute *attr, char *buf) +{ + u32 val; + smp_call_function_single(0, get_svm, , 1); + return sprintf(buf, "%u\n", val); +} +static DEVICE_ATTR(svm, 0444, show_svm, NULL); + +static void create_svm_file(void) +{ + device_create_file(cpu_subsys.dev_root, _attr_svm); +} +#else +static void create_svm_file(void) +{ +} +#endif /* CONFIG_PPC_SVM */ + static int register_cpu_online(unsigned int cpu) { struct cpu *c = _cpu(cpu_devices, cpu); @@ -1057,6 +1084,8 @@ static int __init topology_init(void) sysfs_create_dscr_default(); #endif /* CONFIG_PPC64 */ + create_svm_file(); + return 0; } subsys_initcall(topology_init);
[RFC PATCH 00/14] mmap_sem range locking
Hi, The following is a summarized repost of the range locking mmap_sem idea[1] and is _not_ intended for being considered upstream as there are quite a few issues that arise with this approach of tackling mmap_sem contention (keep reading). In fact this patch is quite incomplete and will break compiling on anything non-x86, and is also _completely broken_ for ksm and hmm. That being said, this does build an enterprise kernel and survives a number of workloads as well as 'runltp -f syscalls'. The previous series is a complete range locking conversion, which ensured we had all the range locking apis we needed. The changelog also included a number of performance numbers and overall design. While finding issues with the code itself is always welcome, the idea of this series is to discuss what can be done on top of it, if anything. >From a locking pov, most recently there has been a revival in the interest of >the range lock code for dchinner's plans of range locking the i_rwsem. However, it showed that xfs's extent tree significantly outperformed[2] the (full) range lock. The performance differences when doing 1:1 rwsem comparisons, have already been shown in [1]. Considering both the range lock and the extent tree lock the whole tree, most of this performance penalties are due to the fact that rbtrees' depth is a lot larger than btree's, so the latter avoids most of the pointer chasing which is a common performance issue. This was a trade-off for not having to allocate memory for the range nodes. However, on the _positive side_, and which is what we care most about for mmap_sem, when actually using the lock as intended, the range locking did show its purpose: IOPS read/write (buffered IO) fio processes rwsem rangelock 1 57k / 57k 64k / 64k 2 61k / 61k 111k / 111k 4 61k / 61k 228k / 228k 8 55k / 55k 195k / 195k 16 15k / 15k40k / 40k So it would be nice to apply this concept to our address space and allow mmaps, munmaps and pagefaults to all work concurrently in non-overlapping scenarios -- which is what is provided by userspace mm related syscalls. However, when using the range lock without a full range, a number of issues around the vma immediately popup as a consequence of this *top-down* approach to solving scalability: Races within a vma: non-overlapping regions can still belong to the same vma, hence wrecking merges and splits. One popular idea is to have a vma->rwsem (taken, for example, after a find_vma()), however, this throws out the window any potential scalability gains for large vmas as we just end up just moving down the point of contention. The same problem occurs when refcouting the vma (such as with speculative pfs). There's also the fact that we can end up taking numerous vma locks as the vma list is later traversed once the first vma is found. Alternatively, we could just expand the passed range such that it covers the whole first and last vma(s) endpoints; of course we don't have that information aprori (protected by mmap_sem :), and enlarging the range _after_ acquiring the lock opens a can of worms because now we have to inform userspace and/or deadlock, among others. Similarly, there's the issue of keeping the vma tree correct during modifications as well as regular find_vma()s. Laurent has already pointed out that we have too many ways of getting a vma: the tree, the list and the vmacache, all currently protected by mmap_sem and breaks because of the above when not using full ranges. This also touches a bit in a more *bottom-up* approach to mmap_sem performance, which scales from within, instead of putting a big rangelock tree on top of the address space. Matthew has pointed out a the xarray as well as an rcu based maple tree[3] replacement of the rbtree, however we already have the vmacache so most of the benefits of a shallower data structure are unnecessary, in cache-hot situations, naturally. The vma-list is easily removable once we have O(1) next/prev pointers, which for rbtrees can be done via threading the data structure (at the cost of extra branch for every level down the tree when inserting). Maple trees already give us this. So all in all, if we were going to go down this path of a cache friendlier tree, we'd end up needing comparisons of the maple tree vs the current vmacache+rbtree combo. Regarding rcu-ifying the vma tree and replacing read locking (and therefore plays nicer with cachelines), I sounds nice, it does not seem practical considering that the page tables cannot be rcu-ified. I'm sure I'm missing a lot more, but I'm hoping to kickstart the conversation again. Patches 1-2: adds the range locking machinery. This is rebased on the rbtree optimizations for interval trees such that
[PATCH 01/14] interval-tree: build unconditionally
In preparation for range locking, this patch gets rid of CONFIG_INTERVAL_TREE option as we will unconditionally build it. Signed-off-by: Davidlohr Bueso --- drivers/gpu/drm/Kconfig | 2 -- drivers/gpu/drm/i915/Kconfig | 1 - drivers/iommu/Kconfig| 1 - lib/Kconfig | 14 -- lib/Kconfig.debug| 1 - lib/Makefile | 3 +-- 6 files changed, 1 insertion(+), 21 deletions(-) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index e360a4a131e1..3405336175ed 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -200,7 +200,6 @@ config DRM_RADEON select POWER_SUPPLY select HWMON select BACKLIGHT_CLASS_DEVICE - select INTERVAL_TREE help Choose this option if you have an ATI Radeon graphics card. There are both PCI and AGP versions. You don't need to choose this to @@ -220,7 +219,6 @@ config DRM_AMDGPU select POWER_SUPPLY select HWMON select BACKLIGHT_CLASS_DEVICE - select INTERVAL_TREE select CHASH help Choose this option if you have a recent AMD Radeon graphics card. diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig index 3d5f1cb6a76c..54d4bc8d141f 100644 --- a/drivers/gpu/drm/i915/Kconfig +++ b/drivers/gpu/drm/i915/Kconfig @@ -3,7 +3,6 @@ config DRM_I915 depends on DRM depends on X86 && PCI select INTEL_GTT - select INTERVAL_TREE # we need shmfs for the swappable backing store, and in particular # the shmem_readpage() which depends upon tmpfs select SHMEM diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index a2ed2b51a0f7..d21e6dc2adae 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -477,7 +477,6 @@ config VIRTIO_IOMMU depends on VIRTIO=y depends on ARM64 select IOMMU_API - select INTERVAL_TREE help Para-virtualised IOMMU driver with virtio. diff --git a/lib/Kconfig b/lib/Kconfig index 8d9239a4156c..e089ac40c062 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -409,20 +409,6 @@ config TEXTSEARCH_FSM config BTREE bool -config INTERVAL_TREE - bool - help - Simple, embeddable, interval-tree. Can find the start of an - overlapping range in log(n) time and then iterate over all - overlapping nodes. The algorithm is implemented as an - augmented rbtree. - - See: - - Documentation/rbtree.txt - - for more information. - config XARRAY_MULTI bool help diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4c35e52c5a2e..54bafed8ba70 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1759,7 +1759,6 @@ config RBTREE_TEST config INTERVAL_TREE_TEST tristate "Interval tree test" depends on DEBUG_KERNEL - select INTERVAL_TREE help A benchmark measuring the performance of the interval tree library diff --git a/lib/Makefile b/lib/Makefile index fb7697031a79..39fd34156692 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -50,7 +50,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o rhashtable.o \ once.o refcount.o usercopy.o errseq.o bucket_locks.o \ -generic-radix-tree.o +generic-radix-tree.o interval_tree.o obj-$(CONFIG_STRING_SELFTEST) += test_string.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o @@ -115,7 +115,6 @@ obj-y += logic_pio.o obj-$(CONFIG_GENERIC_HWEIGHT) += hweight.o obj-$(CONFIG_BTREE) += btree.o -obj-$(CONFIG_INTERVAL_TREE) += interval_tree.o obj-$(CONFIG_ASSOCIATIVE_ARRAY) += assoc_array.o obj-$(CONFIG_DEBUG_PREEMPT) += smp_processor_id.o obj-$(CONFIG_DEBUG_LIST) += list_debug.o -- 2.16.4
[PATCH 05/12] powerpc/pseries: Add and use LPPACA_SIZE constant
Helps document what the hard-coded number means. Also take the opportunity to fix an #endif comment. Suggested-by: Alexey Kardashevskiy Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/kernel/paca.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 9cc91d03ab62..854105db5cff 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -56,6 +56,8 @@ static void *__init alloc_paca_data(unsigned long size, unsigned long align, #ifdef CONFIG_PPC_PSERIES +#define LPPACA_SIZE 0x400 + /* * See asm/lppaca.h for more detail. * @@ -69,7 +71,7 @@ static inline void init_lppaca(struct lppaca *lppaca) *lppaca = (struct lppaca) { .desc = cpu_to_be32(0xd397d781),/* "LpPa" */ - .size = cpu_to_be16(0x400), + .size = cpu_to_be16(LPPACA_SIZE), .fpregs_in_use = 1, .slb_count = cpu_to_be16(64), .vmxregs_in_use = 0, @@ -79,19 +81,18 @@ static inline void init_lppaca(struct lppaca *lppaca) static struct lppaca * __init new_lppaca(int cpu, unsigned long limit) { struct lppaca *lp; - size_t size = 0x400; - BUILD_BUG_ON(size < sizeof(struct lppaca)); + BUILD_BUG_ON(sizeof(struct lppaca) > LPPACA_SIZE); if (early_cpu_has_feature(CPU_FTR_HVMODE)) return NULL; - lp = alloc_paca_data(size, 0x400, limit, cpu); + lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu); init_lppaca(lp); return lp; } -#endif /* CONFIG_PPC_BOOK3S */ +#endif /* CONFIG_PPC_PSERIES */ #ifdef CONFIG_PPC_BOOK3S_64
[PATCH 04/12] powerpc/pseries/svm: Add helpers for UV_SHARE_PAGE and UV_UNSHARE_PAGE
From: Ram Pai These functions are used when the guest wants to grant the hypervisor access to certain pages. Signed-off-by: Ram Pai Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/ultravisor-api.h | 2 ++ arch/powerpc/include/asm/ultravisor.h | 14 ++ 2 files changed, 16 insertions(+) diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index 0e8b72081718..ed68b02869fd 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -20,6 +20,8 @@ /* opcodes */ #define UV_WRITE_PATE 0xF104 #define UV_ESM 0xF110 +#define UV_SHARE_PAGE 0xF130 +#define UV_UNSHARE_PAGE0xF134 #define UV_RETURN 0xF11C #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */ diff --git a/arch/powerpc/include/asm/ultravisor.h b/arch/powerpc/include/asm/ultravisor.h index 09e0a615d96f..537f7717d21a 100644 --- a/arch/powerpc/include/asm/ultravisor.h +++ b/arch/powerpc/include/asm/ultravisor.h @@ -44,6 +44,20 @@ static inline int uv_register_pate(u64 lpid, u64 dw0, u64 dw1) return ucall(UV_WRITE_PATE, retbuf, lpid, dw0, dw1); } +static inline int uv_share_page(u64 pfn, u64 npages) +{ + unsigned long retbuf[UCALL_BUFSIZE]; + + return ucall(UV_SHARE_PAGE, retbuf, pfn, npages); +} + +static inline int uv_unshare_page(u64 pfn, u64 npages) +{ + unsigned long retbuf[UCALL_BUFSIZE]; + + return ucall(UV_UNSHARE_PAGE, retbuf, pfn, npages); +} + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_POWERPC_ULTRAVISOR_H */
[PATCH 06/12] powerpc/pseries/svm: Use shared memory for LPPACA structures
From: Anshuman Khandual LPPACA structures need to be shared with the host. Hence they need to be in shared memory. Instead of allocating individual chunks of memory for a given structure from memblock, a contiguous chunk of memory is allocated and then converted into shared memory. Subsequent allocation requests will come from the contiguous chunk which will be always shared memory for all structures. While we are able to use a kmem_cache constructor for the Debug Trace Log, LPPACAs are allocated very early in the boot process (before SLUB is available) so we need to use a simpler scheme here. Introduce helper is_svm_platform() which uses the S bit of the MSR to tell whether we're running as a secure guest. Signed-off-by: Anshuman Khandual Signed-off-by: Thiago Jung Bauermann --- arch/powerpc/include/asm/svm.h | 26 arch/powerpc/kernel/paca.c | 43 +- 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/svm.h b/arch/powerpc/include/asm/svm.h new file mode 100644 index ..fef3740f46a6 --- /dev/null +++ b/arch/powerpc/include/asm/svm.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * SVM helper functions + * + * Copyright 2019 Anshuman Khandual, IBM Corporation. + */ + +#ifndef _ASM_POWERPC_SVM_H +#define _ASM_POWERPC_SVM_H + +#ifdef CONFIG_PPC_SVM + +static inline bool is_secure_guest(void) +{ + return mfmsr() & MSR_S; +} + +#else /* CONFIG_PPC_SVM */ + +static inline bool is_secure_guest(void) +{ + return false; +} + +#endif /* CONFIG_PPC_SVM */ +#endif /* _ASM_POWERPC_SVM_H */ diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 854105db5cff..a9622f4b45bb 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -18,6 +18,8 @@ #include #include #include +#include +#include #include "setup.h" @@ -58,6 +60,41 @@ static void *__init alloc_paca_data(unsigned long size, unsigned long align, #define LPPACA_SIZE 0x400 +static void *__init alloc_shared_lppaca(unsigned long size, unsigned long align, + unsigned long limit, int cpu) +{ + size_t shared_lppaca_total_size = PAGE_ALIGN(nr_cpu_ids * LPPACA_SIZE); + static unsigned long shared_lppaca_size; + static void *shared_lppaca; + void *ptr; + + if (!shared_lppaca) { + memblock_set_bottom_up(true); + + shared_lppaca = + memblock_alloc_try_nid(shared_lppaca_total_size, + PAGE_SIZE, MEMBLOCK_LOW_LIMIT, + limit, NUMA_NO_NODE); + if (!shared_lppaca) + panic("cannot allocate shared data"); + + memblock_set_bottom_up(false); + uv_share_page(PHYS_PFN(__pa(shared_lppaca)), + shared_lppaca_total_size >> PAGE_SHIFT); + } + + ptr = shared_lppaca + shared_lppaca_size; + shared_lppaca_size += size; + + /* +* This is very early in boot, so no harm done if the kernel crashes at +* this point. +*/ + BUG_ON(shared_lppaca_size >= shared_lppaca_total_size); + + return ptr; +} + /* * See asm/lppaca.h for more detail. * @@ -87,7 +124,11 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned long limit) if (early_cpu_has_feature(CPU_FTR_HVMODE)) return NULL; - lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu); + if (is_secure_guest()) + lp = alloc_shared_lppaca(LPPACA_SIZE, 0x400, limit, cpu); + else + lp = alloc_paca_data(LPPACA_SIZE, 0x400, limit, cpu); + init_lppaca(lp); return lp;
[PATCH 00/12] Secure Virtual Machine Enablement
This series enables Secure Virtual Machines (SVMs) on powerpc. SVMs use the Protected Execution Facility (PEF) and request to be migrated to secure memory during prom_init() so by default all of their memory is inaccessible to the hypervisor. There is an Ultravisor call that the VM can use to request certain pages to be made accessible to (or shared with) the hypervisor. The objective of these patches is to have the guest perform this request for buffers that need to be accessed by the hypervisor such as the LPPACAs, the SWIOTLB memory and the Debug Trace Log. The patch set applies on top of Claudio Carvalho's "kvmppc: Paravirtualize KVM to support ultravisor" series: https://lore.kernel.org/linuxppc-dev/20190518142524.28528-1-cclau...@linux.ibm.com/ I only need the following two patches from his series: [RFC PATCH v2 02/10] KVM: PPC: Ultravisor: Introduce the MSR_S bit [RFC PATCH v2 04/10] KVM: PPC: Ultravisor: Add generic ultravisor call handler Patches 2 and 3 are posted as RFC because we are still finalizing the details on how the ESM blob will be passed to the kernel. All other patches are (hopefully) in upstreamable shape. Unfortunately this series still doesn't enable the use of virtio devices in the secure guest. This support depends on a discussion that is currently ongoing with the virtio community: https://lore.kernel.org/linuxppc-dev/87womn8inf.fsf@morokweng.localdomain/ This was the last time I posted this patch set: https://lore.kernel.org/linuxppc-dev/20180824162535.22798-1-bauer...@linux.ibm.com/ At that time, it wasn't possible to launch a real secure guest because the Ultravisor was still in very early development. Now there is a relatively mature Ultravisor and I was able to test it using Claudio's patches in the host kernel, booting normally using an initramfs for the root filesystem. This is the command used to start up the guest with QEMU 4.0: qemu-system-ppc64 \ -nodefaults \ -cpu host \ -machine pseries,accel=kvm,kvm-type=HV,cap-htm=off,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken \ -display none \ -serial mon:stdio \ -smp 1 \ -m 4G \ -kernel /home/bauermann/vmlinux \ -initrd /home/bauermann/fs_small.cpio \ -append 'debug' Changelog since the RFC from August: - Patch "powerpc/pseries: Introduce option to build secure virtual machines" - New patch. - Patch "powerpc: Add support for adding an ESM blob to the zImage wrapper" - Patch from Benjamin Herrenschmidt, first posted here: https://lore.kernel.org/linuxppc-dev/20180531043417.25073-1-b...@kernel.crashing.org/ - Made minor adjustments to some comments. Code is unchanged. - Patch "powerpc/prom_init: Add the ESM call to prom_init" - New patch from Ram Pai and Michael Anderson. - Patch "powerpc/pseries/svm: Add helpers for UV_SHARE_PAGE and UV_UNSHARE_PAGE" - New patch from Ram Pai. - Patch "powerpc/pseries: Add and use LPPACA_SIZE constant" - Moved LPPACA_SIZE macro inside the CONFIG_PPC_PSERIES #ifdef. - Put sizeof() operand left of comparison operator in BUILD_BUG_ON() macro to appease a checkpatch warning. - Patch "powerpc/pseries/svm: Use shared memory for LPPACA structures" - Moved definition of is_secure_guest() helper to this patch. - Changed shared_lppaca and shared_lppaca_size from globals to static variables inside alloc_shared_lppaca(). - Changed shared_lppaca to hold virtual address instead of physical address. - Patch "powerpc/pseries/svm: Use shared memory for Debug Trace Log (DTL)" - Add get_dtl_cache_ctor() macro. Suggested by Ram Pai. - Patch "powerpc/pseries/svm: Export guest SVM status to user space via sysfs" - New patch from Ryan Grimm. - Patch "powerpc/pseries/svm: Disable doorbells in SVM guests" - New patch from Sukadev Bhattiprolu. - Patch "powerpc/pseries/iommu: Don't use dma_iommu_ops on secure guests" - New patch. - Patch "powerpc/pseries/svm: Force SWIOTLB for secure guests" - New patch with code that was previously in other patches. - Patch "powerpc/configs: Enable secure guest support in pseries and ppc64 defconfigs" - New patch from Ryan Grimm. - Patch "powerpc/pseries/svm: Detect Secure Virtual Machine (SVM) platform" - Dropped this patch by moving its code to other patches. - Patch "powerpc/svm: Select CONFIG_DMA_DIRECT_OPS and CONFIG_SWIOTLB" - No need to select CONFIG_DMA_DIRECT_OPS anymore. The CONFIG_SWIOTLB change was moved to another patch and this patch was dropped. - Patch "powerpc/pseries/svm: Add memory conversion (shared/secure) helper functions" - Dropped patch since the helper functions were unnecessary wrappers around uv_share_page() and uv_unshare_page(). - Patch "powerpc/svm: Convert SWIOTLB buffers
[RFC PATCH 03/12] powerpc/prom_init: Add the ESM call to prom_init
From: Ram Pai Make the Enter-Secure-Mode (ESM) ultravisor call to switch the VM to secure mode. Add "svm=" command line option to turn off switching to secure mode. Introduce CONFIG_PPC_SVM to control support for secure guests. Signed-off-by: Ram Pai [ Generate an RTAS os-term hcall when the ESM ucall fails. ] Signed-off-by: Michael Anderson [ Cleaned up the code a bit. ] Signed-off-by: Thiago Jung Bauermann --- .../admin-guide/kernel-parameters.txt | 5 + arch/powerpc/include/asm/ultravisor-api.h | 1 + arch/powerpc/kernel/prom_init.c | 124 ++ 3 files changed, 130 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index c45a19d654f3..7237d86b25c6 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4501,6 +4501,11 @@ /sys/power/pm_test). Only available when CONFIG_PM_DEBUG is set. Default value is 5. + svm=[PPC] + Format: { on | off | y | n | 1 | 0 } + This parameter controls use of the Protected + Execution Facility on pSeries. + swapaccount=[0|1] [KNL] Enable accounting of swap in memory resource controller if no parameter or 1 is given or disable diff --git a/arch/powerpc/include/asm/ultravisor-api.h b/arch/powerpc/include/asm/ultravisor-api.h index 15e6ce77a131..0e8b72081718 100644 --- a/arch/powerpc/include/asm/ultravisor-api.h +++ b/arch/powerpc/include/asm/ultravisor-api.h @@ -19,6 +19,7 @@ /* opcodes */ #define UV_WRITE_PATE 0xF104 +#define UV_ESM 0xF110 #define UV_RETURN 0xF11C #endif /* _ASM_POWERPC_ULTRAVISOR_API_H */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 523bb99d7676..5d8a3efb54f2 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -44,6 +44,7 @@ #include #include #include +#include #include @@ -174,6 +175,10 @@ static unsigned long __prombss prom_tce_alloc_end; static bool __prombss prom_radix_disable; #endif +#ifdef CONFIG_PPC_SVM +static bool __prombss prom_svm_disable; +#endif + struct platform_support { bool hash_mmu; bool radix_mmu; @@ -809,6 +814,17 @@ static void __init early_cmdline_parse(void) if (prom_radix_disable) prom_debug("Radix disabled from cmdline\n"); #endif /* CONFIG_PPC_PSERIES */ + +#ifdef CONFIG_PPC_SVM + opt = prom_strstr(prom_cmd_line, "svm="); + if (opt) { + bool val; + + opt += sizeof("svm=") - 1; + if (!prom_strtobool(opt, )) + prom_svm_disable = !val; + } +#endif /* CONFIG_PPC_SVM */ } #ifdef CONFIG_PPC_PSERIES @@ -1707,6 +1723,43 @@ static void __init prom_close_stdin(void) } } +#ifdef CONFIG_PPC_SVM +static int prom_rtas_os_term_hcall(uint64_t args) +{ + register uint64_t arg1 asm("r3") = 0xf000; + register uint64_t arg2 asm("r4") = args; + + asm volatile("sc 1\n" : "=r" (arg1) : + "r" (arg1), + "r" (arg2) :); + return arg1; +} + +static struct rtas_args __prombss os_term_args; + +static void __init prom_rtas_os_term(char *str) +{ + phandle rtas_node; + __be32 val; + u32 token; + + prom_printf("%s: start...\n", __func__); + rtas_node = call_prom("finddevice", 1, 1, ADDR("/rtas")); + prom_printf("rtas_node: %x\n", rtas_node); + if (!PHANDLE_VALID(rtas_node)) + return; + + val = 0; + prom_getprop(rtas_node, "ibm,os-term", , sizeof(val)); + token = be32_to_cpu(val); + prom_printf("ibm,os-term: %x\n", token); + if (token == 0) + prom_panic("Could not get token for ibm,os-term\n"); + os_term_args.token = cpu_to_be32(token); + prom_rtas_os_term_hcall((uint64_t)_term_args); +} +#endif /* CONFIG_PPC_SVM */ + /* * Allocate room for and instantiate RTAS */ @@ -3162,6 +3215,74 @@ static void unreloc_toc(void) #endif #endif +#ifdef CONFIG_PPC_SVM +/* + * The ESM blob is a data structure with information needed by the Ultravisor to + * validate the integrity of the secure guest. + */ +static void *get_esm_blob(void) +{ + /* +* FIXME: We are still finalizing the details on how prom_init will grab +* the ESM blob. When that is done, this function will be updated. +*/ + return (void *)0xdeadbeef; +} + +/* + * Perform the Enter Secure Mode ultracall. + */ +static int enter_secure_mode(void *esm_blob, void *retaddr, void *fdt) +{ + register uint64_t func asm("r0") = UV_ESM; + register uint64_t arg1 asm("r3") = (uint64_t)esm_blob; + register uint64_t arg2
Re: linux-next: build failure after merge of the imx-mxs tree
On Tue, May 21, 2019 at 02:16:47AM +, Anson Huang wrote: > Hi, Stephen/Shawn > I realized this issue last week when I updated my Linux-next tree (NOT > sure why I did NOT meet such issue when I did the patch), so I resent the > patch series of adding head file "io.h" to fix this issue, please apply below > V2 patch series instead, sorry for the inconvenience. > > https://patchwork.kernel.org/patch/10944681/ Okay, fixed. Sorry for the breakage, Stephen. Shawn > > -Original Message- > > From: Stephen Rothwell [mailto:s...@canb.auug.org.au] > > Sent: Tuesday, May 21, 2019 6:38 AM > > To: Shawn Guo > > Cc: Linux Next Mailing List ; Linux Kernel > > Mailing > > List ; Anson Huang ; > > Aisheng Dong > > Subject: linux-next: build failure after merge of the imx-mxs tree > > > > Hi Shawn, > > > > After merging the imx-mxs tree, today's linux-next build (arm > > multi_v7_defconfig) failed like this: > > > > drivers/clk/imx/clk.c: In function 'imx_mmdc_mask_handshake': > > drivers/clk/imx/clk.c:20:8: error: implicit declaration of function > > 'readl_relaxed'; did you mean 'xchg_relaxed'? [-Werror=implicit-function- > > declaration] > > reg = readl_relaxed(ccm_base + CCM_CCDR); > > ^ > > xchg_relaxed > > drivers/clk/imx/clk.c:22:2: error: implicit declaration of function > > 'writel_relaxed'; did you mean 'xchg_relaxed'? [-Werror=implicit-function- > > declaration] > > writel_relaxed(reg, ccm_base + CCM_CCDR); > > ^~~~~~ > > xchg_relaxed > > > > Caused by commit > > > > 0dc6b492b6e0 ("clk: imx: Add common API for masking MMDC handshake") > > > > I have used the imx-mxs tree from next-20190520 for today. > > > > -- > > Cheers, > > Stephen Rothwell
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 12:46:05PM -0400, Johannes Weiner wrote: > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > - Approach > > > > The approach we chose was to use a new interface to allow userspace to > > proactively reclaim entire processes by leveraging platform information. > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > that are known to be cold from userspace and to avoid races with lmkd > > by reclaiming apps as soon as they entered the cached state. Additionally, > > it could provide many chances for platform to use much information to > > optimize memory efficiency. > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > I agree with this approach and the semantics. But these names are very > vague and extremely easy to confuse since they're so similar. > > MADV_COLD could be a good name, but for deactivating pages, not > reclaiming them - marking memory "cold" on the LRU for later reclaim. > > For the immediate reclaim one, I think there is a better option too: > In virtual memory speak, putting a page into secondary storage (or > ensuring it's already there), and then freeing its in-memory copy, is > called "paging out". And that's what this flag is supposed to do. So > how about MADV_PAGEOUT? > > With that, we'd have: > > MADV_FREE: Mark data invalid, free memory when needed > MADV_DONTNEED: Mark data invalid, free memory immediately > > MADV_COLD: Data is not used for a while, free memory when needed > MADV_PAGEOUT: Data is not used for a while, free memory immediately > > What do you think? There are several suggestions until now. Thanks, Folks! For deactivating: - MADV_COOL - MADV_RECLAIM_LAZY - MADV_DEACTIVATE - MADV_COLD - MADV_FREE_PRESERVE For reclaiming: - MADV_COLD - MADV_RECLAIM_NOW - MADV_RECLAIMING - MADV_PAGEOUT - MADV_DONTNEED_PRESERVE It seems everybody doesn't like MADV_COLD so want to go with other. For consisteny of view with other existing hints of madvise, -preserve postfix suits well. However, originally, I don't like the naming FREE vs DONTNEED from the beginning. They were easily confused. I prefer PAGEOUT to RECLAIM since it's more likely to be nuance to represent reclaim with memory pressure and is supposed to paged-in if someone need it later. So, it imply PRESERVE. If there is not strong against it, I want to go with MADV_COLD and MADV_PAGEOUT. Other opinion?
Re: [PATCH] dmaengine: stm32-dma: Fix redundant call to platform_get_irq
On 07-05-19, 09:54, Amelie Delaunay wrote: > Commit c6504be53972 ("dmaengine: stm32-dma: Fix unsigned variable compared > with zero") duplicated the call to platform_get_irq. > So remove the first call to platform_get_irq. Applied, thanks -- ~Vinod
Re: [PATCH 3/4] dmaengine: fsl-edma: support little endian for edma driver
On 06-05-19, 09:03, Peng Ma wrote: > improve edma driver to support little endian. Can you explain a bit more how adding the below lines adds little endian support... > > Signed-off-by: Peng Ma > --- > drivers/dma/fsl-edma-common.c |5 + > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c > index 680b2a0..6bf238e 100644 > --- a/drivers/dma/fsl-edma-common.c > +++ b/drivers/dma/fsl-edma-common.c > @@ -83,9 +83,14 @@ void fsl_edma_chan_mux(struct fsl_edma_chan *fsl_chan, > u32 ch = fsl_chan->vchan.chan.chan_id; > void __iomem *muxaddr; > unsigned int chans_per_mux, ch_off; > + int endian_diff[4] = {3, 1, -1, -3}; > > chans_per_mux = fsl_chan->edma->n_chans / DMAMUX_NR; > ch_off = fsl_chan->vchan.chan.chan_id % chans_per_mux; > + > + if (!fsl_chan->edma->big_endian) > + ch_off += endian_diff[ch_off % 4]; > + > muxaddr = fsl_chan->edma->muxbase[ch / chans_per_mux]; > slot = EDMAMUX_CHCFG_SOURCE(slot); > > -- > 1.7.1 -- ~Vinod
Re: [V2 2/2] dmaengine: fsl-qdma: Add improvement
On 06-05-19, 10:21, Peng Ma wrote: > When an error occurs we should clean the error register then to return Applied, thanks -- ~Vinod
Re: [V2 1/2] dmaengine: fsl-qdma: fixed the source/destination descriptor format
On 06-05-19, 10:21, Peng Ma wrote: > CMD of Source/Destination descriptor format should be lower of > struct fsl_qdma_engine number data address. > > Signed-off-by: Peng Ma > --- > changed for V2: > - Fix descriptor spelling > > drivers/dma/fsl-qdma.c | 25 + > 1 files changed, 17 insertions(+), 8 deletions(-) > > diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c > index aa1d0ae..2e8b46b 100644 > --- a/drivers/dma/fsl-qdma.c > +++ b/drivers/dma/fsl-qdma.c > @@ -113,6 +113,7 @@ > /* Field definition for Descriptor offset */ > #define QDMA_CCDF_STATUS 20 > #define QDMA_CCDF_OFFSET 20 > +#define QDMA_SDDF_CMD(x) (((u64)(x)) << 32) > > /* Field definition for safe loop count*/ > #define FSL_QDMA_HALT_COUNT 1500 > @@ -214,6 +215,12 @@ struct fsl_qdma_engine { > > }; > > +static inline void > +qdma_sddf_set_cmd(struct fsl_qdma_format *sddf, u32 val) > +{ > + sddf->data = QDMA_SDDF_CMD(val); > +} Do you really need this helper which calls another macro! > + > static inline u64 > qdma_ccdf_addr_get64(const struct fsl_qdma_format *ccdf) > { > @@ -341,6 +348,7 @@ static void fsl_qdma_free_chan_resources(struct dma_chan > *chan) > static void fsl_qdma_comp_fill_memcpy(struct fsl_qdma_comp *fsl_comp, > dma_addr_t dst, dma_addr_t src, u32 len) > { > + u32 cmd; > struct fsl_qdma_format *sdf, *ddf; > struct fsl_qdma_format *ccdf, *csgf_desc, *csgf_src, *csgf_dest; > > @@ -353,6 +361,7 @@ static void fsl_qdma_comp_fill_memcpy(struct > fsl_qdma_comp *fsl_comp, > > memset(fsl_comp->virt_addr, 0, FSL_QDMA_COMMAND_BUFFER_SIZE); > memset(fsl_comp->desc_virt_addr, 0, FSL_QDMA_DESCRIPTOR_BUFFER_SIZE); > + why did you add a blank line in this 'fix', it does not belong here! > /* Head Command Descriptor(Frame Descriptor) */ > qdma_desc_addr_set64(ccdf, fsl_comp->bus_addr + 16); > qdma_ccdf_set_format(ccdf, qdma_ccdf_get_offset(ccdf)); > @@ -369,14 +378,14 @@ static void fsl_qdma_comp_fill_memcpy(struct > fsl_qdma_comp *fsl_comp, > /* This entry is the last entry. */ > qdma_csgf_set_f(csgf_dest, len); > /* Descriptor Buffer */ > - sdf->data = > - cpu_to_le64(FSL_QDMA_CMD_RWTTYPE << > - FSL_QDMA_CMD_RWTTYPE_OFFSET); > - ddf->data = > - cpu_to_le64(FSL_QDMA_CMD_RWTTYPE << > - FSL_QDMA_CMD_RWTTYPE_OFFSET); > - ddf->data |= > - cpu_to_le64(FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET); > + cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE << > + FSL_QDMA_CMD_RWTTYPE_OFFSET); > + qdma_sddf_set_cmd(sdf, cmd); why not do sddf->data = QDMA_SDDF_CMD(cmd); > + > + cmd = cpu_to_le32(FSL_QDMA_CMD_RWTTYPE << > + FSL_QDMA_CMD_RWTTYPE_OFFSET); > + cmd |= cpu_to_le32(FSL_QDMA_CMD_LWC << FSL_QDMA_CMD_LWC_OFFSET); > + qdma_sddf_set_cmd(ddf, cmd); > } > > /* > -- > 1.7.1 -- ~Vinod
Re: [RFC PATCH] powerpc/mm: Implement STRICT_MODULE_RWX
On Wed, 2019-05-15 at 06:20 +, Christophe Leroy wrote: Confirming this works on hash and radix book3s64. > + > + // only operate on VM areas for now > + area = find_vm_area((void *)addr); > + if (!area || end > (unsigned long)area->addr + area->size || > + !(area->flags & VM_ALLOC)) > + return -EINVAL; https://lore.kernel.org/patchwork/project/lkml/list/?series=391470 With this patch, the above series causes crashes on (at least) Hash, since it adds another user of change_page_rw() and change_page_nx() that for reasons I don't understand yet, we can't handle. I can work around this with: if (area->flags & VM_FLUSH_RESET_PERMS) return 0; so this is broken on at least one platform as of 5.2-rc1. We're going to look into this more to see if there's anything else we have to do as a result of this series before the next merge window, or if just working around it like this is good enough. - Russell
Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c
On Tue, 21 May 2019, Gen Zhang wrote: > On Mon, May 20, 2019 at 11:26:20PM -0400, Nicolas Pitre wrote: > > On Tue, 21 May 2019, Gen Zhang wrote: > > > > > On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote: > > > > On Tue, 21 May 2019, Gen Zhang wrote: > > > > > > > > > In function con_init(), the pointer variable vc_cons[currcons].d, vc > > > > > and > > > > > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they > > > > > are > > > > > used in the following codes. > > > > > However, when there is a memory allocation error, kzalloc() can fail. > > > > > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf) > > > > > dereference may happen. And it will cause the kernel to crash. > > > > > Therefore, > > > > > we should check return value and handle the error. > > > > > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in > > > > > include/uapi/linux/vt.h. So there is no need to unwind the loop. > > > > > > > > But what if someone changes that define? It won't be obvious that some > > > > code did rely on it to be defined to 1. > > > I re-examine the source code. MIN_NR_CONSOLES is only defined once and > > > no other changes to it. > > > > Yes, that is true today. But if someone changes that in the future, how > > will that person know that you relied on it to be 1 for not needing to > > unwind the loop? > > > > > > Nicolas > Hi Nicolas, > Thanks for your explaination! And I got your point. And is this way > proper? Not quite. > err_vc_screenbuf: > kfree(vc); > for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) > vc_cons[currcons].d = NULL; > return -ENOMEM; > err_vc: > console_unlock(); > return -ENOMEM; Now imagine that MIN_NR_CONSOLES is defined to 10 instead of 1. What happens with allocated memory if the err_vc condition is met on the 5th loop? If err_vc_screenbuf condition is encountered on the 5th loop (curcons = 4), what is the value of vc_cons[4].d? Isn't it the same as vc that you just freed? Nicolas
Re: [PATCH] dmaengine: jz4780: Fix transfers being ACKed too soon
On 04-05-19, 23:37, Paul Cercueil wrote: > When a multi-descriptor DMA transfer is in progress, the "IRQ pending" > flag will apparently be set for that channel as soon as the last > descriptor loads, way before the IRQ actually happens. This behaviour > has been observed on the JZ4725B, but maybe other SoCs are affected. > > In the case where another DMA transfer is running into completion on a > separate channel, the IRQ handler would then run the completion handler > for our previous channel even if the transfer didn't actually finish. > > Fix this by checking in the completion handler that we're indeed done; > if not the interrupted DMA transfer will simply be resumed. Applied, thanks -- ~Vinod
Re: [PATCH] dmaengine: jz4780: Use SPDX license notifier
On 04-05-19, 23:34, Paul Cercueil wrote: > Use SPDX license notifier instead of plain text in the header. Applied, thanks -- ~Vinod
Re: [RFC PATCH 07/11] bpf: implement writable buffers in contexts
On Mon, May 20, 2019 at 09:21:34PM -0400, Steven Rostedt wrote: > Hi Kris, > > Note, it's best to thread patches. Otherwise they get spread out in > mail boxes and hard to manage. That is, every patch should be a reply > to the 00/11 header patch. Thanks for that advice - I will make sure to do that for future postings. > Also, Peter Ziljstra (Cc'd) is the maintainer of perf on the kernel > side. Please include him on Ccing perf changes that are done inside the > kernel. Ah, my apologies for missing Peter in the list of Cc's. Thank you for adding him. I will update my list. Kris > On Mon, 20 May 2019 23:52:24 + (UTC) > Kris Van Hees wrote: > > > Currently, BPF supports writes to packet data in very specific cases. > > The implementation can be of more general use and can be extended to any > > number of writable buffers in a context. The implementation adds two new > > register types: PTR_TO_BUFFER and PTR_TO_BUFFER_END, similar to the types > > PTR_TO_PACKET and PTR_TO_PACKET_END. In addition, a field 'buf_id' is > > added to the reg_state structure as a way to distinguish between different > > buffers in a single context. > > > > Buffers are specified in the context by a pair of members: > > - a pointer to the start of the buffer (type PTR_TO_BUFFER) > > - a pointer to the first byte beyond the buffer (type PTR_TO_BUFFER_END) > > > > A context can contain multiple buffers. Each buffer/buffer_end pair is > > identified by a unique id (buf_id). The start-of-buffer member offset is > > usually a good unique identifier. > > > > The semantics for using a writable buffer are the same as for packet data. > > The BPF program must contain a range test (buf + num > buf_end) to ensure > > that the verifier can verify that offsets are within the allowed range. > > > > Whenever a helper is called that might update the content of the context > > all range information for registers that hold pointers to a buffer is > > cleared, just as it is done for packet pointers. > > > > Signed-off-by: Kris Van Hees > > Reviewed-by: Nick Alcock > > --- > > include/linux/bpf.h | 3 + > > include/linux/bpf_verifier.h | 4 +- > > kernel/bpf/verifier.c| 198 --- > > 3 files changed, 145 insertions(+), 60 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index e4bcb79656c4..fc3eda0192fb 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -275,6 +275,8 @@ enum bpf_reg_type { > > PTR_TO_TCP_SOCK, /* reg points to struct tcp_sock */ > > PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */ > > PTR_TO_TP_BUFFER,/* reg points to a writable raw tp's buffer */ > > + PTR_TO_BUFFER, /* reg points to ctx buffer */ > > + PTR_TO_BUFFER_END, /* reg points to ctx buffer end */ > > }; > > > > /* The information passed from prog-specific *_is_valid_access > > @@ -283,6 +285,7 @@ enum bpf_reg_type { > > struct bpf_insn_access_aux { > > enum bpf_reg_type reg_type; > > int ctx_field_size; > > + u32 buf_id; > > }; > > > > static inline void > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > > index 1305ccbd8fe6..3538382184f3 100644 > > --- a/include/linux/bpf_verifier.h > > +++ b/include/linux/bpf_verifier.h > > @@ -45,7 +45,7 @@ struct bpf_reg_state { > > /* Ordering of fields matters. See states_equal() */ > > enum bpf_reg_type type; > > union { > > - /* valid when type == PTR_TO_PACKET */ > > + /* valid when type == PTR_TO_PACKET | PTR_TO_BUFFER */ > > u16 range; > > > > /* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE | > > @@ -132,6 +132,8 @@ struct bpf_reg_state { > > */ > > u32 frameno; > > enum bpf_reg_liveness live; > > + /* For PTR_TO_BUFFER, to identify distinct buffers in a context. */ > > + u32 buf_id; > > }; > > > > enum bpf_stack_slot_type { > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index f9e5536fd1af..5fba4e6f5424 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -406,6 +406,8 @@ static const char * const reg_type_str[] = { > > [PTR_TO_TCP_SOCK] = "tcp_sock", > > [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null", > > [PTR_TO_TP_BUFFER] = "tp_buffer", > > + [PTR_TO_BUFFER] = "buf", > > + [PTR_TO_BUFFER_END] = "buf_end", > > }; > > > > static char slot_type_char[] = { > > @@ -467,6 +469,9 @@ static void print_verifier_state(struct > > bpf_verifier_env *env, > > verbose(env, ",off=%d", reg->off); > > if (type_is_pkt_pointer(t)) > > verbose(env, ",r=%d", reg->range); > > + else if (t == PTR_TO_BUFFER) > > + verbose(env, ",r=%d,bid=%d", reg->range, > > + reg->buf_id);
Re: [PATCH v3 11/14] dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm
On 07-05-19, 09:16, Robin Gong wrote: > Because the number of ecspi1 rx event on i.mx8mm is 0, the condition > check ignore such special case without dma channel enabled, which caused > ecspi1 rx works failed. Actually, no need to check event_id0, checking > event_id1 is enough for DEV_2_DEV case because it's so lucky that event_id1 > never be 0. Well is that by chance or design that event_id1 will be never 0? > > Signed-off-by: Robin Gong > --- > drivers/dma/imx-sdma.c | 12 +--- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c > index a495c7f..86594fc 100644 > --- a/drivers/dma/imx-sdma.c > +++ b/drivers/dma/imx-sdma.c > @@ -1370,8 +1370,8 @@ static void sdma_free_chan_resources(struct dma_chan > *chan) > > sdma_channel_synchronize(chan); > > - if (sdmac->event_id0) > - sdma_event_disable(sdmac, sdmac->event_id0); > + sdma_event_disable(sdmac, sdmac->event_id0); > + > if (sdmac->event_id1) > sdma_event_disable(sdmac, sdmac->event_id1); > > @@ -1670,11 +1670,9 @@ static int sdma_config(struct dma_chan *chan, > memcpy(>slave_config, dmaengine_cfg, sizeof(*dmaengine_cfg)); > > /* Set ENBLn earlier to make sure dma request triggered after that */ > - if (sdmac->event_id0) { > - if (sdmac->event_id0 >= sdmac->sdma->drvdata->num_events) > - return -EINVAL; > - sdma_event_enable(sdmac, sdmac->event_id0); > - } > + if (sdmac->event_id0 >= sdmac->sdma->drvdata->num_events) > + return -EINVAL; > + sdma_event_enable(sdmac, sdmac->event_id0); > > if (sdmac->event_id1) { > if (sdmac->event_id1 >= sdmac->sdma->drvdata->num_events) > -- > 2.7.4 > -- ~Vinod
Re: [PATCH v3 04/14] dmaengine: imx-sdma: remove dupilicated sdma_load_context
On 07-05-19, 09:16, Robin Gong wrote: > Since sdma_transfer_init() will do sdma_load_context before any > sdma transfer, no need once more in sdma_config_channel(). Acked-by: Vinod Koul -- ~Vinod
Re: [PATCH v3 09/14] dmaengine: imx-sdma: remove ERR009165 on i.mx6ul
On 07-05-19, 09:16, Robin Gong wrote: > ECSPI issue fixed from i.mx6ul at hardware level, no need > ERR009165 anymore on those chips such as i.mx8mq. Add i.mx6sx > from where i.mx6ul source. Acked-by: Vinod Koul -- ~Vinod
Re: [PATCH v3 05/14] dmaengine: imx-sdma: add mcu_2_ecspi script
On 07-05-19, 09:16, Robin Gong wrote: > Add mcu_2_ecspi script to fix ecspi errata ERR009165. Acked-by: Vinod Koul -- ~Vinod
Re: [PATCH v2] mm, memory-failure: clarify error message
> Some user who install SIGBUS handler that does longjmp out > therefore keeping the process alive is confused by the error > message > "[188988.765862] Memory failure: 0x1840200: Killing >cellsrv:33395 due to hardware memory corruption" > Slightly modify the error message to improve clarity. > > Signed-off-by: Jane Chu > --- > mm/memory-failure.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index fc8b517..c4f4bcd 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -216,7 +216,7 @@ static int kill_proc(struct to_kill *tk, unsigned long > pfn, int flags) > short addr_lsb = tk->size_shift; > int ret; > > -pr_err("Memory failure: %#lx: Killing %s:%d due to hardware memory > corruption\n", > +pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware > memory corruption\n", > pfn, t->comm, t->pid); > > if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) { > -- > 1.8.3.1 This error message is helpful. Acked-by: Pankaj Gupta > > ___ > Linux-nvdimm mailing list > linux-nvd...@lists.01.org > https://lists.01.org/mailman/listinfo/linux-nvdimm >
Re: [PATCH v4 2/3] fdt: add support for rng-seed
On Mon, May 20, 2019 at 7:54 AM Nicolas Boichat wrote: > Alphabetical order. Original headers are not sorted, should I sort them here? > > > I'm a little bit concerned about this, as we really want the rng-seed > value to be wiped, and not kept in memory (even if it's hard to > access). > > IIUC, fdt_delprop splices the device tree, so it'll override > "rng-seed" property with whatever device tree entries follow it. > However, if rng-seed is the last property (or if the entries that > follow are smaller than rng-seed), the seed will stay in memory (or > part of it). > > fdt_nop_property in v2 would erase it for sure. I don't know if there > is a way to make sure that rng-seed is removed for good while still > deleting the property (maybe modify fdt_splice_ to do a memset(.., 0) > of the moved chunk?). > So maybe we can use fdt_nop_property() back?
[PATCH] arm64/mm: Move PTE_VALID from SW defined to HW page table entry definitions
PTE_VALID signifies that the last level page table entry is valid and it is MMU recognized while walking the page table. This is not a software defined PTE bit and should not be listed like one. Just move it to appropriate header file. Signed-off-by: Anshuman Khandual Cc: Catalin Marinas Cc: Will Deacon Cc: Steve Capper Cc: Suzuki Poulose Cc: James Morse --- arch/arm64/include/asm/pgtable-hwdef.h | 1 + arch/arm64/include/asm/pgtable-prot.h | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index a69259c..974f011 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -153,6 +153,7 @@ /* * Level 3 descriptor (PTE). */ +#define PTE_VALID (_AT(pteval_t, 1) << 0) #define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0) #define PTE_TYPE_FAULT (_AT(pteval_t, 0) << 0) #define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0) diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h index 986e41c..38c7148 100644 --- a/arch/arm64/include/asm/pgtable-prot.h +++ b/arch/arm64/include/asm/pgtable-prot.h @@ -24,7 +24,6 @@ /* * Software defined PTE bits definition. */ -#define PTE_VALID (_AT(pteval_t, 1) << 0) #define PTE_WRITE (PTE_DBM)/* same as DBM (51) */ #define PTE_DIRTY (_AT(pteval_t, 1) << 55) #define PTE_SPECIAL(_AT(pteval_t, 1) << 56) -- 2.7.4
Re: [PATCH v7 01/12] x86/crypto: Adapt assembly for PIE support
On Mon, May 20, 2019 at 04:19:26PM -0700, Thomas Garnier wrote: > diff --git a/arch/x86/crypto/sha256-avx2-asm.S > b/arch/x86/crypto/sha256-avx2-asm.S > index 1420db15dcdd..2ced4b2f6c76 100644 > --- a/arch/x86/crypto/sha256-avx2-asm.S > +++ b/arch/x86/crypto/sha256-avx2-asm.S > @@ -588,37 +588,42 @@ last_block_enter: > mov INP, _INP(%rsp) > > ## schedule 48 input dwords, by doing 3 rounds of 12 each > - xor SRND, SRND > + leaqK256(%rip), SRND > + ## loop1 upper bound > + leaqK256+3*4*32(%rip), INP > > .align 16 > loop1: > - vpaddd K256+0*32(SRND), X0, XFER > + vpaddd 0*32(SRND), X0, XFER > vmovdqa XFER, 0*32+_XFER(%rsp, SRND) > FOUR_ROUNDS_AND_SCHED _XFER + 0*32 > > - vpaddd K256+1*32(SRND), X0, XFER > + vpaddd 1*32(SRND), X0, XFER > vmovdqa XFER, 1*32+_XFER(%rsp, SRND) > FOUR_ROUNDS_AND_SCHED _XFER + 1*32 > > - vpaddd K256+2*32(SRND), X0, XFER > + vpaddd 2*32(SRND), X0, XFER > vmovdqa XFER, 2*32+_XFER(%rsp, SRND) > FOUR_ROUNDS_AND_SCHED _XFER + 2*32 > > - vpaddd K256+3*32(SRND), X0, XFER > + vpaddd 3*32(SRND), X0, XFER > vmovdqa XFER, 3*32+_XFER(%rsp, SRND) > FOUR_ROUNDS_AND_SCHED _XFER + 3*32 > > add $4*32, SRND > - cmp $3*4*32, SRND > + cmp INP, SRND > jb loop1 > > + ## loop2 upper bound > + leaqK256+4*4*32(%rip), INP > + > loop2: > ## Do last 16 rounds with no scheduling > - vpaddd K256+0*32(SRND), X0, XFER > + vpaddd 0*32(SRND), X0, XFER > vmovdqa XFER, 0*32+_XFER(%rsp, SRND) > DO_4ROUNDS _XFER + 0*32 > > - vpaddd K256+1*32(SRND), X1, XFER > + vpaddd 1*32(SRND), X1, XFER > vmovdqa XFER, 1*32+_XFER(%rsp, SRND) > DO_4ROUNDS _XFER + 1*32 > add $2*32, SRND > @@ -626,7 +631,7 @@ loop2: > vmovdqa X2, X0 > vmovdqa X3, X1 > > - cmp $4*4*32, SRND > + cmp INP, SRND > jb loop2 > > mov _CTX(%rsp), CTX There is a crash in sha256-avx2-asm.S with this patch applied. Looks like the %rsi register is being used for two different things at the same time: 'INP' and 'y3'? You should be able to reproduce by booting a kernel configured with: CONFIG_CRYPTO_SHA256_SSSE3=y # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set Crash report: BUG: unable to handle page fault for address: c8ff83b21a80 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP CPU: 3 PID: 359 Comm: cryptomgr_test Not tainted 5.2.0-rc1-00109-g9fb4fd100429b #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:loop1+0x4/0x888 Code: 83 c6 40 48 89 b4 24 08 02 00 00 48 8d 3d 94 d3 d0 00 48 8d 35 0d d5 d0 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 c RSP: 0018:c90001d43880 EFLAGS: 00010286 RAX: 6a09e667 RBX: bb67ae85 RCX: 3c6ef372 RDX: 510e527f RSI: 81dde380 RDI: 81dde200 RBP: c90001d43b10 R08: a54ff53a R09: 9b05688c R10: 1f83d9ab R11: 5be0cd19 R12: R13: 88807cfd4598 R14: 810d0da0 R15: c90001d43cc0 FS: () GS:88807fd8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: c8ff83b21a80 CR3: 0200f000 CR4: 003406e0 Call Trace: sha256_avx2_finup arch/x86/crypto/sha256_ssse3_glue.c:242 [inline] sha256_avx2_final+0x17/0x20 arch/x86/crypto/sha256_ssse3_glue.c:247 crypto_shash_final+0x13/0x20 crypto/shash.c:166 shash_async_final+0x11/0x20 crypto/shash.c:265 crypto_ahash_op+0x24/0x60 crypto/ahash.c:373 crypto_ahash_final+0x11/0x20 crypto/ahash.c:384 do_ahash_op.constprop.13+0x10/0x40 crypto/testmgr.c:1049 test_hash_vec_cfg+0x5b1/0x610 crypto/testmgr.c:1225 test_hash_vec crypto/testmgr.c:1268 [inline] __alg_test_hash.isra.8+0x115/0x1d0 crypto/testmgr.c:1498 alg_test_hash+0x7b/0x100 crypto/testmgr.c:1546 alg_test.part.12+0xa4/0x360 crypto/testmgr.c:4931 alg_test+0x12/0x30 crypto/testmgr.c:4895 cryptomgr_test+0x26/0x50 crypto/algboss.c:223 kthread+0x124/0x140 kernel/kthread.c:254 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Modules linked in: CR2: c8ff83b21a80 ---[ end trace ee8ece604888de3e ]--- - Eric
[PATCH] MIPS: remove a space after -I to cope with header search paths for VDSO
Commit 9cc342f6c4a0 ("treewide: prefix header search paths with $(srctree)/") caused a build error for MIPS VDSO. CC arch/mips/vdso/gettimeofday.o In file included from ../arch/mips/vdso/vdso.h:26, from ../arch/mips/vdso/gettimeofday.c:11: ../arch/mips/include/asm/page.h:12:10: fatal error: spaces.h: No such file or directory #include ^~ The cause of the error is a missing space after the compiler flag -I . Kbuild used to have a global restriction "no space after -I", but commit 48f6e3cf5bc6 ("kbuild: do not drop -I without parameter") got rid of it. Having a space after -I is no longer a big deal as far as Kbuild is concerned. It is still a big deal for MIPS because arch/mips/vdso/Makefile filters the header search paths, like this: ccflags-vdso := \ $(filter -I%,$(KBUILD_CFLAGS)) \ ..., which relies on the assumption that there is no space after -I . Fixes: 9cc342f6c4a0 ("treewide: prefix header search paths with $(srctree)/") Reported-by: kbuild test robot Signed-off-by: Masahiro Yamada --- arch/mips/pnx833x/Platform | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/pnx833x/Platform b/arch/mips/pnx833x/Platform index 6b1a847d593f..287260669551 100644 --- a/arch/mips/pnx833x/Platform +++ b/arch/mips/pnx833x/Platform @@ -1,5 +1,5 @@ # NXP STB225 platform-$(CONFIG_SOC_PNX833X) += pnx833x/ -cflags-$(CONFIG_SOC_PNX833X) += -I $(srctree)/arch/mips/include/asm/mach-pnx833x +cflags-$(CONFIG_SOC_PNX833X) += -I$(srctree)/arch/mips/include/asm/mach-pnx833x load-$(CONFIG_NXP_STB220) += 0x80001000 load-$(CONFIG_NXP_STB225) += 0x80001000 -- 2.17.1
Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c
On Mon, May 20, 2019 at 11:26:20PM -0400, Nicolas Pitre wrote: > On Tue, 21 May 2019, Gen Zhang wrote: > > > On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote: > > > On Tue, 21 May 2019, Gen Zhang wrote: > > > > > > > In function con_init(), the pointer variable vc_cons[currcons].d, vc and > > > > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are > > > > used in the following codes. > > > > However, when there is a memory allocation error, kzalloc() can fail. > > > > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf) > > > > dereference may happen. And it will cause the kernel to crash. > > > > Therefore, > > > > we should check return value and handle the error. > > > > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in > > > > include/uapi/linux/vt.h. So there is no need to unwind the loop. > > > > > > But what if someone changes that define? It won't be obvious that some > > > code did rely on it to be defined to 1. > > I re-examine the source code. MIN_NR_CONSOLES is only defined once and > > no other changes to it. > > Yes, that is true today. But if someone changes that in the future, how > will that person know that you relied on it to be 1 for not needing to > unwind the loop? > > > Nicolas Hi Nicolas, Thanks for your explaination! And I got your point. And is this way proper? err_vc_screenbuf: kfree(vc); for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) vc_cons[currcons].d = NULL; return -ENOMEM; err_vc: console_unlock(); return -ENOMEM; Thanks Gen
Re: [PATCH v2 3/7] drivers/soc: xdma: Add user interface
On Tue, 21 May 2019, at 05:49, Eddie James wrote: > This commits adds a miscdevice to provide a user interface to the XDMA > engine. The interface provides the write operation to start DMA > operations. The DMA parameters are passed as the data to the write call. > The actual data to transfer is NOT passed through write. Note that both > directions of DMA operation are accomplished through the write command; > BMC to host and host to BMC. > > The XDMA engine is restricted to only accessing the reserved memory > space on the AST2500, typically used by the VGA. For this reason, this > commit also adds a simple memory manager for this reserved memory space > which can then be allocated in pages by users calling mmap. The space > allocated by a client will be the space used in the DMA operation. For > an "upstream" (BMC to host) operation, the data in the client's area > will be transferred to the host. For a "downstream" (host to BMC) > operation, the host data will be placed in the client's memory area. Did you explore genalloc as a solution for allocating out of the VGA reserved memory? Wondering if we can avoid implementing a custom allocator (even if it is simple). Andrew > > Poll is also provided in order to determine when the DMA operation is > complete for non-blocking IO. > > Signed-off-by: Eddie James > --- > drivers/soc/aspeed/aspeed-xdma.c | 301 > +++ > 1 file changed, 301 insertions(+) > > diff --git a/drivers/soc/aspeed/aspeed-xdma.c > b/drivers/soc/aspeed/aspeed-xdma.c > index 0992d2a..2162ca0 100644 > --- a/drivers/soc/aspeed/aspeed-xdma.c > +++ b/drivers/soc/aspeed/aspeed-xdma.c > @@ -118,6 +118,12 @@ struct aspeed_xdma_cmd { > u32 resv1; > }; > > +struct aspeed_xdma_vga_blk { > + u32 phys; > + u32 size; > + struct list_head list; > +}; > + > struct aspeed_xdma_client; > > struct aspeed_xdma { > @@ -128,6 +134,8 @@ struct aspeed_xdma { > > unsigned long flags; > unsigned int cmd_idx; > + struct mutex list_lock; > + struct mutex start_lock; > wait_queue_head_t wait; > struct aspeed_xdma_client *current_client; > > @@ -136,6 +144,9 @@ struct aspeed_xdma { > dma_addr_t vga_dma; > void *cmdq; > void *vga_virt; > + struct list_head vga_blks_free; > + > + struct miscdevice misc; > }; > > struct aspeed_xdma_client { > @@ -325,6 +336,260 @@ static irqreturn_t aspeed_xdma_irq(int irq, void *arg) > return IRQ_HANDLED; > } > > +static u32 aspeed_xdma_alloc_vga_blk(struct aspeed_xdma *ctx, u32 req_size) > +{ > + u32 phys = 0; > + u32 size = PAGE_ALIGN(req_size); > + struct aspeed_xdma_vga_blk *free; > + > + mutex_lock(>list_lock); > + > + list_for_each_entry(free, >vga_blks_free, list) { > + if (free->size >= size) { > + phys = free->phys; > + > + if (size == free->size) { > + dev_dbg(ctx->dev, > + "Allocd %08x[%08x r(%08x)], del.\n", > + phys, size, req_size); > + list_del(>list); > + kfree(free); > + } else { > + free->phys += size; > + free->size -= size; > + dev_dbg(ctx->dev, "Allocd %08x[%08x r(%08x)], " > + "shrunk %08x[%08x].\n", phys, size, > + req_size, free->phys, free->size); > + } > + > + break; > + } > + } > + > + mutex_unlock(>list_lock); > + > + return phys; > +} > + > +static void aspeed_xdma_free_vga_blk(struct aspeed_xdma *ctx, u32 phys, > + u32 req_size) > +{ > + u32 min_free = UINT_MAX; > + u32 size = PAGE_ALIGN(req_size); > + const u32 end = phys + size; > + struct aspeed_xdma_vga_blk *free; > + > + mutex_lock(>list_lock); > + > + list_for_each_entry(free, >vga_blks_free, list) { > + if (end == free->phys) { > + u32 fend = free->phys + free->size; > + > + dev_dbg(ctx->dev, > + "Freed %08x[%08x r(%08x)], exp %08x[%08x].\n", > + phys, size, req_size, free->phys, free->size); > + > + free->phys = phys; > + free->size = fend - free->phys; > + > + mutex_unlock(>list_lock); > + return; > + } > + > + if (free->phys < min_free) > + min_free = free->phys; > + } > + > + free = kzalloc(sizeof(*free), GFP_KERNEL); > + if (free) { > + free->phys = phys; > + free->size = size; > + > + dev_dbg(ctx->dev, "Freed %08x[%08x r(%08x)], new.\n", phys, > +
Re: [PATCH v5 1/3] dt-bindings: i2c: extend existing opencore bindings.
Hi Rob, On Mon, May 20, 2019 at 8:07 PM Rob Herring wrote: > > On Mon, May 20, 2019 at 9:12 AM Sagar Shrikant Kadam > wrote: > > > > Add FU540-C000 specific device tree bindings to already > > available i2-ocores file. This device is available on > > HiFive Unleashed Rev A00 board. Move interrupt and interrupt > > parents under optional property list as these can be optional. > > > > The FU540-C000 SoC from sifive, has an Opencore's I2C block > > reimplementation. > > > > The DT compatibility string for this IP is present in HDL and available at. > > https://github.com/sifive/sifive-blocks/blob/master/src/main/scala/devices/i2c/I2C.scala#L73 > > > > Signed-off-by: Sagar Shrikant Kadam > > --- > > Documentation/devicetree/bindings/i2c/i2c-ocores.txt | 7 ++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt > > b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt > > index 17bef9a..b73960e 100644 > > --- a/Documentation/devicetree/bindings/i2c/i2c-ocores.txt > > +++ b/Documentation/devicetree/bindings/i2c/i2c-ocores.txt > > @@ -2,8 +2,11 @@ Device tree configuration for i2c-ocores > > > > Required properties: > > - compatible : "opencores,i2c-ocores" or "aeroflexgaisler,i2cmst" > > +"sifive,fu540-c000-i2c" or "sifive,i2c0". > > It's not an OR because both are required. Please reformat to 1 valid > combination per line. Yes, will rectify it in V6. > > + for Opencore based I2C IP block reimplemented in > > + FU540-C000 SoC.Please refer > > sifive-blocks-ip-versioning.txt > > + for additional details. > > - reg : bus address start and address range size of device > > -- interrupts : interrupt number > > - clocks : handle to the controller clock; see the note below. > > Mutually exclusive with opencores,ip-clock-frequency > > - opencores,ip-clock-frequency: frequency of the controller clock in Hz; > > @@ -12,6 +15,8 @@ Required properties: > > - #size-cells : should be <0> > > > > Optional properties: > > +- interrupt-parent: handle to interrupt controller. > > Drop this. interrupt-parent is implied. > Sure, will exclude it in v6. > > +- interrupts : interrupt number. > > - clock-frequency : frequency of bus clock in Hz; see the note below. > > Defaults to 100 KHz when the property is not specified > > - reg-shift : device register offsets are shifted by this value > > -- > > 1.9.1 > > > > ___ > linux-riscv mailing list > linux-ri...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv Thanks, Sagar
[PATCH] clk: mediatek: Remove MT8183 unused clock
Remove MT8183 sspm clock Signed-off-by: Erin Lo --- This clock should only be set in secure world. --- drivers/clk/mediatek/clk-mt8183.c | 19 --- 1 file changed, 19 deletions(-) diff --git a/drivers/clk/mediatek/clk-mt8183.c b/drivers/clk/mediatek/clk-mt8183.c index 9d8651033ae9..1aa5f4059251 100644 --- a/drivers/clk/mediatek/clk-mt8183.c +++ b/drivers/clk/mediatek/clk-mt8183.c @@ -395,14 +395,6 @@ static const char * const atb_parents[] = { "syspll_d5" }; -static const char * const sspm_parents[] = { - "clk26m", - "univpll_d2_d4", - "syspll_d2_d2", - "univpll_d2_d2", - "syspll_d3" -}; - static const char * const dpi0_parents[] = { "clk26m", "tvdpll_d2", @@ -606,9 +598,6 @@ static const struct mtk_mux top_muxes[] = { MUX_GATE_CLR_SET_UPD(CLK_TOP_MUX_ATB, "atb_sel", atb_parents, 0xa0, 0xa4, 0xa8, 0, 2, 7, 0x004, 24), - MUX_GATE_CLR_SET_UPD(CLK_TOP_MUX_SSPM, "sspm_sel", - sspm_parents, 0xa0, - 0xa4, 0xa8, 8, 3, 15, 0x004, 25), MUX_GATE_CLR_SET_UPD(CLK_TOP_MUX_DPI0, "dpi0_sel", dpi0_parents, 0xa0, 0xa4, 0xa8, 16, 4, 23, 0x004, 26), @@ -947,12 +936,8 @@ static const struct mtk_gate infra_clks[] = { "fufs_sel", 13), GATE_INFRA2(CLK_INFRA_MD32_BCLK, "infra_md32_bclk", "axi_sel", 14), - GATE_INFRA2(CLK_INFRA_SSPM, "infra_sspm", - "sspm_sel", 15), GATE_INFRA2(CLK_INFRA_UNIPRO_MBIST, "infra_unipro_mbist", "axi_sel", 16), - GATE_INFRA2(CLK_INFRA_SSPM_BUS_HCLK, "infra_sspm_bus_hclk", - "axi_sel", 17), GATE_INFRA2(CLK_INFRA_I2C5, "infra_i2c5", "i2c_sel", 18), GATE_INFRA2(CLK_INFRA_I2C5_ARBITER, "infra_i2c5_arbiter", @@ -986,10 +971,6 @@ static const struct mtk_gate infra_clks[] = { "msdc50_0_sel", 1), GATE_INFRA3(CLK_INFRA_MSDC2_SELF, "infra_msdc2_self", "msdc50_0_sel", 2), - GATE_INFRA3(CLK_INFRA_SSPM_26M_SELF, "infra_sspm_26m_self", - "f_f26m_ck", 3), - GATE_INFRA3(CLK_INFRA_SSPM_32K_SELF, "infra_sspm_32k_self", - "f_f26m_ck", 4), GATE_INFRA3(CLK_INFRA_UFS_AXI, "infra_ufs_axi", "axi_sel", 5), GATE_INFRA3(CLK_INFRA_I2C6, "infra_i2c6", -- 2.18.0
Re: [PATCH] kvm: x86: refine kvm_get_arch_capabilities()
Ping. On 4/19/2019 10:16 AM, Xiaoyao Li wrote: 1. Using X86_FEATURE_ARCH_CAPABILITIES to enumerate the existence of MSR_IA32_ARCH_CAPABILITIES to avoid using rdmsrl_safe(). 2. Since kvm_get_arch_capabilities() is only used in this file, making it static. Signed-off-by: Xiaoyao Li --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/x86.c | 8 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a9d03af34030..d4ae67870764 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1526,7 +1526,6 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low, unsigned long ipi_bitmap_high, u32 min, unsigned long icr, int op_64_bit); -u64 kvm_get_arch_capabilities(void); void kvm_define_shared_msr(unsigned index, u32 msr); int kvm_set_shared_msr(unsigned index, u64 val, u64 mask); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a0d1fc80ac5a..ba8e269a8cd2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1205,11 +1205,12 @@ static u32 msr_based_features[] = { static unsigned int num_msr_based_features; -u64 kvm_get_arch_capabilities(void) +static u64 kvm_get_arch_capabilities(void) { - u64 data; + u64 data = 0; - rdmsrl_safe(MSR_IA32_ARCH_CAPABILITIES, ); + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data); /* * If we're doing cache flushes (either "always" or "cond") @@ -1225,7 +1226,6 @@ u64 kvm_get_arch_capabilities(void) return data; } -EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities); static int kvm_get_msr_feature(struct kvm_msr_entry *msr) {
[PATCH] arm64/hugetlb: Use macros for contiguous huge page sizes
Replace all open encoded contiguous huge page size computations with available macro encodings CONT_PTE_SIZE and CONT_PMD_SIZE. There are other instances where these macros are used in the file and this change makes it consistently use the same mnemonic. Signed-off-by: Anshuman Khandual Cc: Catalin Marinas Cc: Will Deacon Cc: Steve Capper Cc: Mark Rutland --- arch/arm64/mm/hugetlbpage.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 6b4a47b..05b5dda 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -236,7 +236,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, if (sz == PUD_SIZE) { ptep = (pte_t *)pudp; - } else if (sz == (PAGE_SIZE * CONT_PTES)) { + } else if (sz == (CONT_PTE_SIZE)) { pmdp = pmd_alloc(mm, pudp, addr); WARN_ON(addr & (sz - 1)); @@ -254,7 +254,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, ptep = huge_pmd_share(mm, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); - } else if (sz == (PMD_SIZE * CONT_PMDS)) { + } else if (sz == (CONT_PMD_SIZE)) { pmdp = pmd_alloc(mm, pudp, addr); WARN_ON(addr & (sz - 1)); return (pte_t *)pmdp; @@ -462,9 +462,9 @@ static int __init hugetlbpage_init(void) #ifdef CONFIG_ARM64_4K_PAGES add_huge_page_size(PUD_SIZE); #endif - add_huge_page_size(PMD_SIZE * CONT_PMDS); + add_huge_page_size(CONT_PMD_SIZE); add_huge_page_size(PMD_SIZE); - add_huge_page_size(PAGE_SIZE * CONT_PTES); + add_huge_page_size(CONT_PTE_SIZE); return 0; } @@ -478,9 +478,9 @@ static __init int setup_hugepagesz(char *opt) #ifdef CONFIG_ARM64_4K_PAGES case PUD_SIZE: #endif - case PMD_SIZE * CONT_PMDS: + case CONT_PMD_SIZE: case PMD_SIZE: - case PAGE_SIZE * CONT_PTES: + case CONT_PTE_SIZE: add_huge_page_size(ps); return 1; } -- 2.7.4
Re: [PATCH v7 11/12] soc: mediatek: cmdq: add cmdq_dev_get_client_reg function
On Tue, 2019-05-21 at 09:11 +0800, Bibby Hsieh wrote: > GCE cannot know the register base address, this function > can help cmdq client to get the cmdq_client_reg structure. > > Signed-off-by: Bibby Hsieh > --- > drivers/soc/mediatek/mtk-cmdq-helper.c | 25 + > include/linux/soc/mediatek/mtk-cmdq.h | 18 ++ > 2 files changed, 43 insertions(+) > > diff --git a/drivers/soc/mediatek/mtk-cmdq-helper.c > b/drivers/soc/mediatek/mtk-cmdq-helper.c > index 70ad4d806fac..815845bb5982 100644 > --- a/drivers/soc/mediatek/mtk-cmdq-helper.c > +++ b/drivers/soc/mediatek/mtk-cmdq-helper.c > @@ -27,6 +27,31 @@ struct cmdq_instruction { > u8 op; > }; > > +struct cmdq_client_reg *cmdq_dev_get_client_reg(struct device *dev, int idx) > +{ > + struct cmdq_client_reg *client_reg; > + struct of_phandle_args spec; > + > + client_reg = devm_kzalloc(dev, sizeof(*client_reg), GFP_KERNEL); > + if (!client_reg) > + return NULL; > + > + if (of_parse_phandle_with_args(dev->of_node, "mediatek,gce-client-reg", > +"#subsys-cells", idx, )) { > + dev_err(dev, "can't parse gce-client-reg property (%d)", idx); I think you should call devm_kfree(client_reg) here because this function may not be called in client driver's probe function. But in another view point, I would like you to move the memory allocation out of this function. When client call cmdq_dev_get_client_reg() to get a pointer, it's easy that client does not free it because you does not provide free API, Some client may embed struct cmdq_client_reg with its client structure together, struct client { struct cmdq_client_reg client_reg; }; Because each client may have different memory allocation strategy, so I would like you to move memory allocation out of this function to let client driver have the flexibility. Regards, CK > + > + return NULL; > + } > + > + client_reg->subsys = spec.args[0]; > + client_reg->offset = spec.args[1]; > + client_reg->size = spec.args[2]; > + of_node_put(spec.np); > + > + return client_reg; > +} > +EXPORT_SYMBOL(cmdq_dev_get_client_reg); > + > static void cmdq_client_timeout(struct timer_list *t) > { > struct cmdq_client *client = from_timer(client, t, timer); > diff --git a/include/linux/soc/mediatek/mtk-cmdq.h > b/include/linux/soc/mediatek/mtk-cmdq.h > index a345870a6d10..d0dea3780f7a 100644 > --- a/include/linux/soc/mediatek/mtk-cmdq.h > +++ b/include/linux/soc/mediatek/mtk-cmdq.h > @@ -15,6 +15,12 @@ > > struct cmdq_pkt; > > +struct cmdq_client_reg { > + u8 subsys; > + u16 offset; > + u16 size; > +}; > + > struct cmdq_client { > spinlock_t lock; > u32 pkt_cnt; > @@ -142,4 +148,16 @@ int cmdq_pkt_flush_async(struct cmdq_pkt *pkt, > cmdq_async_flush_cb cb, > */ > int cmdq_pkt_flush(struct cmdq_pkt *pkt); > > +/** > + * cmdq_dev_get_client_reg() - parse cmdq client reg from the device node of > CMDQ client > + * @dev: device of CMDQ mailbox client > + * @idx: the index of desired reg > + * > + * Return: CMDQ client reg pointer > + * > + * Help CMDQ client pasing the cmdq client reg > + * from the device node of CMDQ client. > + */ > +struct cmdq_client_reg *cmdq_dev_get_client_reg(struct device *dev, int > idx); > + > #endif /* __MTK_CMDQ_H__ */
Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c
On Tue, 21 May 2019, Gen Zhang wrote: > On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote: > > On Tue, 21 May 2019, Gen Zhang wrote: > > > > > In function con_init(), the pointer variable vc_cons[currcons].d, vc and > > > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are > > > used in the following codes. > > > However, when there is a memory allocation error, kzalloc() can fail. > > > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf) > > > dereference may happen. And it will cause the kernel to crash. Therefore, > > > we should check return value and handle the error. > > > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in > > > include/uapi/linux/vt.h. So there is no need to unwind the loop. > > > > But what if someone changes that define? It won't be obvious that some > > code did rely on it to be defined to 1. > I re-examine the source code. MIN_NR_CONSOLES is only defined once and > no other changes to it. Yes, that is true today. But if someone changes that in the future, how will that person know that you relied on it to be 1 for not needing to unwind the loop? Nicolas
On how to make your claim,Kindly follow the instructions
US Department of the Treasury 1500 Pennsylvania Avenue, NW Washington, DC 20220 Good day to you. I write to inform you that approval has been given to release your long over due payment, in the amount of $2,550,000.00 being payment for Contract/Inheritance payment and Scam Victims, which your name appeared on the payment list. On how to make your claim, Kindly follow the instructions below. Kindly Contact: Ken Phelan U.S. Department of the Treasury Office of Financial Research (OFR) E-mail1: kenphelanoffic...@163.com E-mail2: kenphelan...@financier.com With the details below for more information. Also indicate your payment File Number: UNC/FSC/01477/PAYTAG. 1) Full Names 2) Home Address 3) Nationality: 4) Phone Number: Ensure the details are correct and use your payment file (Payment code) as your subject for immediate attention. Good luck to you and congratulations. Best Regards, Steven Terner Mnuchin Secretary of the Treasury. ** The information in this email is confidential. The contents may not be disclosed or used by anyone other than the addressee. If you are not the intended recipient, please delete this message from your computer and destroy any Copies which have been made.
Re: [PATCH v2 4/7] drivers/soc: xdma: Add PCI device configuration sysfs
On Tue, 21 May 2019, at 05:51, Eddie James wrote: > The AST2500 has two PCI devices embedded. The XDMA engine can use either > device to perform DMA transfers. Users need the capability to choose > which device to use. This commit therefore adds two sysfs files that > toggle the AST2500 and XDMA engine between the two PCI devices. > > Signed-off-by: Eddie James > --- > drivers/soc/aspeed/aspeed-xdma.c | 64 > > 1 file changed, 64 insertions(+) > > diff --git a/drivers/soc/aspeed/aspeed-xdma.c > b/drivers/soc/aspeed/aspeed-xdma.c > index 2162ca0..002b571 100644 > --- a/drivers/soc/aspeed/aspeed-xdma.c > +++ b/drivers/soc/aspeed/aspeed-xdma.c > @@ -667,6 +667,64 @@ static void aspeed_xdma_free_vga_blks(struct > aspeed_xdma *ctx) > } > } > > +static int aspeed_xdma_change_pcie_conf(struct aspeed_xdma *ctx, u32 conf) > +{ > + int rc; > + > + mutex_lock(>start_lock); > + rc = wait_event_interruptible_timeout(ctx->wait, > + !test_bit(XDMA_IN_PRG, > + >flags), > + msecs_to_jiffies(1000)); > + if (rc < 0) { > + mutex_unlock(>start_lock); > + return -EINTR; > + } > + > + /* previous op didn't complete, wake up waiters anyway */ > + if (!rc) > + wake_up_interruptible_all(>wait); > + > + reset_control_assert(ctx->reset); > + msleep(10); > + > + aspeed_scu_pcie_write(ctx, conf); > + msleep(10); > + > + reset_control_deassert(ctx->reset); > + msleep(10); > + > + aspeed_xdma_init_eng(ctx); > + > + mutex_unlock(>start_lock); > + > + return 0; > +} > + > +static ssize_t aspeed_xdma_use_bmc(struct device *dev, > +struct device_attribute *attr, > +const char *buf, size_t count) > +{ > + int rc; > + struct aspeed_xdma *ctx = dev_get_drvdata(dev); > + > + rc = aspeed_xdma_change_pcie_conf(ctx, aspeed_xdma_bmc_pcie_conf); > + return rc ?: count; > +} > +static DEVICE_ATTR(use_bmc, 0200, NULL, aspeed_xdma_use_bmc); > + > +static ssize_t aspeed_xdma_use_vga(struct device *dev, > +struct device_attribute *attr, > +const char *buf, size_t count) > +{ > + int rc; > + struct aspeed_xdma *ctx = dev_get_drvdata(dev); > + > + rc = aspeed_xdma_change_pcie_conf(ctx, aspeed_xdma_vga_pcie_conf); > + return rc ?: count; > +} > +static DEVICE_ATTR(use_vga, 0200, NULL, aspeed_xdma_use_vga); > + > static int aspeed_xdma_probe(struct platform_device *pdev) > { > int irq; > @@ -745,6 +803,9 @@ static int aspeed_xdma_probe(struct platform_device *pdev) > return rc; > } > > + device_create_file(dev, _attr_use_bmc); > + device_create_file(dev, _attr_use_vga); Two attributes is a broken approach IMO. This gives the false representation of 4 states (neither, vga, bmc, both) when really there are only two (vga and bmc). I think we should have one attribute that reacts to "vga" and "bmc" writes. Andrew > + > return 0; > } > > @@ -752,6 +813,9 @@ static int aspeed_xdma_remove(struct platform_device > *pdev) > { > struct aspeed_xdma *ctx = platform_get_drvdata(pdev); > > + device_remove_file(ctx->dev, _attr_use_vga); > + device_remove_file(ctx->dev, _attr_use_bmc); > + > misc_deregister(>misc); > > aspeed_xdma_free_vga_blks(ctx); > -- > 1.8.3.1 > >
Re: [PATCH v3] vt: Fix a missing-check bug in drivers/tty/vt/vt.c
On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote: > As soon as you release the lock, another thread could come along and > start using the memory pointed by vc_cons[currcons].d you're about to > free here. This is unlikely for an initcall, but still. > > You should consider this ordering instead: > > err_vc_screenbuf: > kfree(vc); > vc_cons[currcons].d = NULL; > err_vc: > console_unlock(); > return -ENOMEM; In function con_init(), the pointer variable vc_cons[currcons].d, vc and vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are used in the following codes. However, when there is a memory allocation error, kzalloc() can fail. Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf) dereference may happen. And it will cause the kernel to crash. Therefore, we should check return value and handle the error. Further,the loop condition MIN_NR_CONSOLES is defined as 1 in include/uapi/linux/vt.h and it is not changed. So there is no need to unwind the loop. Signed-off-by: Gen Zhang --- diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c index fdd12f8..ea47eb3 100644 --- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -3350,10 +3350,14 @@ static int __init con_init(void) for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) { vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), GFP_NOWAIT); + if (!vc) + goto err_vc; INIT_WORK(_cons[currcons].SAK_work, vc_SAK); tty_port_init(>port); visual_init(vc, currcons, 1); vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_NOWAIT); + if (!vc->vc_screenbuf) + goto err_vc_screenbuf; vc_init(vc, vc->vc_rows, vc->vc_cols, currcons || !vc->vc_sw->con_save_screen); } @@ -3375,6 +3379,13 @@ static int __init con_init(void) register_console(_console_driver); #endif return 0; +err_vc_screenbuf: + kfree(vc); + vc_cons[currcons].d = NULL; +err_vc: + console_unlock(); + return -ENOMEM; + } console_initcall(con_init); ---
Re: [v2 PATCH] mm: vmscan: correct nr_reclaimed for THP
On 5/20/19 5:43 PM, Yang Shi wrote: On 5/16/19 11:10 PM, Johannes Weiner wrote: On Tue, May 14, 2019 at 01:44:35PM -0700, Yang Shi wrote: On 5/13/19 11:20 PM, Michal Hocko wrote: On Mon 13-05-19 21:36:59, Yang Shi wrote: On Mon, May 13, 2019 at 2:45 PM Michal Hocko wrote: On Mon 13-05-19 14:09:59, Yang Shi wrote: [...] I think we can just account 512 base pages for nr_scanned for isolate_lru_pages() to make the counters sane since PGSCAN_KSWAPD/DIRECT just use it. And, sc->nr_scanned should be accounted as 512 base pages too otherwise we may have nr_scanned < nr_to_reclaim all the time to result in false-negative for priority raise and something else wrong (e.g. wrong vmpressure). Be careful. nr_scanned is used as a pressure indicator to slab shrinking AFAIR. Maybe this is ok but it really begs for much more explaining I don't know why my company mailbox didn't receive this email, so I replied with my personal email. It is not used to double slab pressure any more since commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets"). It uses sc->priority to determine the pressure for slab shrinking now. So, I think we can just remove that "double slab pressure" code. It is not used actually and looks confusing now. Actually, the "double slab pressure" does something opposite. The extra inc to sc->nr_scanned just prevents from raising sc->priority. I have to get in sync with the recent changes. I am aware there were some patches floating around but I didn't get to review them. I was trying to point out that nr_scanned used to have a side effect to be careful about. If it doesn't have anymore then this is getting much more easier of course. Please document everything in the changelog. Thanks for reminding. Yes, I remembered nr_scanned would double slab pressure. But, when I inspected into the code yesterday, it turns out it is not true anymore. I will run some test to make sure it doesn't introduce regression. Yeah, sc->nr_scanned is used for three things right now: 1. vmpressure - this looks at the scanned/reclaimed ratio so it won't change semantics as long as scanned & reclaimed are fixed in parallel 2. compaction/reclaim - this is broken. Compaction wants a certain number of physical pages freed up before going back to compacting. Without Yang Shi's fix, we can overreclaim by a factor of 512. 3. kswapd priority raising - this is broken. kswapd raises priority if we scan fewer pages than the reclaim target (which itself is obviously expressed in order-0 pages). As a result, kswapd can falsely raise its aggressiveness even when it's making great progress. Both sc->nr_scanned & sc->nr_reclaimed should be fixed. Yes, v3 patch (sit in my local repo now) did fix both. BTW, I noticed the counter of memory reclaim is not correct with THP swap on vanilla kernel, please see the below: pgsteal_kswapd 21435 pgsteal_direct 26573329 pgscan_kswapd 3514 pgscan_direct 14417775 pgsteal is always greater than pgscan, my patch could fix the problem. Ouch, how is that possible with the current code? I think it happens when isolate_lru_pages() counts 1 nr_scanned for a THP, then shrink_page_list() splits the THP and we reclaim tail pages one by one. This goes all the way back to the initial THP patch! I think so. It does make sense. But, the weird thing is I just see this with synchronous swap device (some THPs got swapped out in a whole, some got split), but I've never seen this with rotate swap device (all THPs got split). I haven't figured out why. isolate_lru_pages() needs to be fixed. Its return value, nr_taken, is correct, but its *nr_scanned parameter is wrong, which causes issues: 1. The trace point, as Yang Shi pointed out, will underreport the number of pages scanned, as it reports it along with nr_to_scan (base pages) and nr_taken (base pages) 2. vmstat and memory.stat count 'struct page' operations rather than base pages, which makes zero sense to neither user nor kernel developers (I routinely multiply these counters by 4096 to get a sense of work performed). All of isolate_lru_pages()'s accounting should be in base pages, which includes nr_scanned and PGSCAN_SKIPPED. That should also simplify the code; e.g.: for (total_scan = 0; scan < nr_to_scan && nr_taken < nr_to_scan && !list_empty(src); total_scan++) { scan < nr_to_scan && nr_taken >= nr_to_scan is a weird condition that does not make sense in page reclaim imo. Reclaim cares about physical memory - freeing one THP is as much progress for reclaim as freeing 512 order-0 pages. Yes, I do agree. The v3 patch did this. IMO *all* '++' in vmscan.c are suspicious and should be reviewed: nr_scanned, nr_reclaimed, nr_dirty, nr_unqueued_dirty, nr_congested, nr_immediate, nr_writeback, nr_ref_keep, nr_unmap_fail, pgactivate, total_scan & scan, nr_skipped. Some of them should be fine but I'm not sure the side effect. IMHO, let's fix the most obvious problem first. A
Re: [RFC PATCH v5 16/16] dcache: Add CONFIG_DCACHE_SMO
On Tue, May 21, 2019 at 02:05:38AM +, Roman Gushchin wrote: > On Tue, May 21, 2019 at 11:31:18AM +1000, Tobin C. Harding wrote: > > On Tue, May 21, 2019 at 12:57:47AM +, Roman Gushchin wrote: > > > On Mon, May 20, 2019 at 03:40:17PM +1000, Tobin C. Harding wrote: > > > > In an attempt to make the SMO patchset as non-invasive as possible add a > > > > config option CONFIG_DCACHE_SMO (under "Memory Management options") for > > > > enabling SMO for the DCACHE. Whithout this option dcache constructor is > > > > used but no other code is built in, with this option enabled slab > > > > mobility is enabled and the isolate/migrate functions are built in. > > > > > > > > Add CONFIG_DCACHE_SMO to guard the partial shrinking of the dcache via > > > > Slab Movable Objects infrastructure. > > > > > > Hm, isn't it better to make it a static branch? Or basically anything > > > that allows switching on the fly? > > > > If that is wanted, turning SMO on and off per cache, we can probably do > > this in the SMO code in SLUB. > > Not necessarily per cache, but without recompiling the kernel. > > > > > It seems that the cost of just building it in shouldn't be that high. > > > And the question if the defragmentation worth the trouble is so much > > > easier to answer if it's possible to turn it on and off without rebooting. > > > > If the question is 'is defragmentation worth the trouble for the > > dcache', I'm not sure having SMO turned off helps answer that question. > > If one doesn't shrink the dentry cache there should be very little > > overhead in having SMO enabled. So if one wants to explore this > > question then they can turn on the config option. Please correct me if > > I'm wrong. > > The problem with a config option is that it's hard to switch over. > > So just to test your changes in production a new kernel should be built, > tested and rolled out to a representative set of machines (which can be > measured in thousands of machines). Then if results are questionable, > it should be rolled back. > > What you're actually guarding is the kmem_cache_setup_mobility() call, > which can be perfectly avoided using a boot option, for example. Turning > it on and off completely dynamic isn't that hard too. > > Of course, it's up to you, it's just probably easier to find new users > of a new feature, when it's easy to test it. Ok, cool - I like it. Will add for next version. thanks, Tobin.
Re: [PATCH v7 03/12] x86: Add macro to get symbol address for PIE support
On May 20, 2019 4:19:28 PM PDT, Thomas Garnier wrote: >From: Thomas Garnier > >Add a new _ASM_MOVABS macro to fetch a symbol address. It will be used >to replace "_ASM_MOV $, %dst" code construct that are not >compatible with PIE. > >Signed-off-by: Thomas Garnier >--- > arch/x86/include/asm/asm.h | 1 + > 1 file changed, 1 insertion(+) > >diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h >index 3ff577c0b102..3a686057e882 100644 >--- a/arch/x86/include/asm/asm.h >+++ b/arch/x86/include/asm/asm.h >@@ -30,6 +30,7 @@ > #define _ASM_ALIGN__ASM_SEL(.balign 4, .balign 8) > > #define _ASM_MOV __ASM_SIZE(mov) >+#define _ASM_MOVABS __ASM_SEL(movl, movabsq) > #define _ASM_INC __ASM_SIZE(inc) > #define _ASM_DEC __ASM_SIZE(dec) > #define _ASM_ADD __ASM_SIZE(add) This is just about *always* wrong on x86-86. We should be using leaq sym(%rip),%reg. If it isn't reachable by leaq, then it is a non-PIE symbol like percpu. You do have to keep those distinct! -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c
On Mon, May 20, 2019 at 10:55:40PM -0400, Nicolas Pitre wrote: > On Tue, 21 May 2019, Gen Zhang wrote: > > > In function con_init(), the pointer variable vc_cons[currcons].d, vc and > > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are > > used in the following codes. > > However, when there is a memory allocation error, kzalloc() can fail. > > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf) > > dereference may happen. And it will cause the kernel to crash. Therefore, > > we should check return value and handle the error. > > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in > > include/uapi/linux/vt.h. So there is no need to unwind the loop. > > But what if someone changes that define? It won't be obvious that some > code did rely on it to be defined to 1. I re-examine the source code. MIN_NR_CONSOLES is only defined once and no other changes to it. > > > Signed-off-by: Gen Zhang > > > > --- > > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > > index fdd12f8..b756609 100644 > > --- a/drivers/tty/vt/vt.c > > +++ b/drivers/tty/vt/vt.c > > @@ -3350,10 +3350,14 @@ static int __init con_init(void) > > > > for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) { > > vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), > > GFP_NOWAIT); > > + if (!vc_cons[currcons].d || !vc) > > Both vc_cons[currcons].d and vc are assigned the same value on the > previous line. You don't have to test them both. Thanks for this comment! > > > + goto err_vc; > > INIT_WORK(_cons[currcons].SAK_work, vc_SAK); > > tty_port_init(>port); > > visual_init(vc, currcons, 1); > > vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_NOWAIT); > > + if (!vc->vc_screenbuf) > > + goto err_vc_screenbuf; > > vc_init(vc, vc->vc_rows, vc->vc_cols, > > currcons || !vc->vc_sw->con_save_screen); > > } > > @@ -3375,6 +3379,14 @@ static int __init con_init(void) > > register_console(_console_driver); > > #endif > > return 0; > > +err_vc: > > + console_unlock(); > > + return -ENOMEM; > > +err_vc_screenbuf: > > + console_unlock(); > > + kfree(vc); > > + vc_cons[currcons].d = NULL; > > + return -ENOMEM; > > As soon as you release the lock, another thread could come along and > start using the memory pointed by vc_cons[currcons].d you're about to > free here. This is unlikely for an initcall, but still. > > You should consider this ordering instead: > > err_vc_screenbuf: > kfree(vc); > vc_cons[currcons].d = NULL; > err_vc: > console_unlock(); > return -ENOMEM; > > Thanks for your patient reply, Nicolas! I will work on this patch and resubmit it. Thanks Gen > > } > > console_initcall(con_init); > > > > --- > >
Re: [PATCH 1/8] net: qualcomm: rmnet: fix struct rmnet_map_header
On Mon 20 May 19:30 PDT 2019, Alex Elder wrote: > On 5/20/19 8:32 PM, Subash Abhinov Kasiviswanathan wrote: > >> > >> If you are telling me that the command/data flag resides at bit > >> 7 of the first byte, I will update the field masks in a later > >> patch in this series to reflect that. > >> > > > > Higher order bit is Command / Data. > > So what this means is that to get the command/data bit we use: > > first_byte & 0x80 > > If that is correct I will remove this patch from the series and > will update the subsequent patches so bit 7 is the command bit, > bit 6 is reserved, and bits 0-5 are the pad length. > > I will post a v2 of the series with these changes, and will > incorporate Bjorn's "Reviewed-by". > But didn't you say that your testing show that the current bit order is wrong? I still like the cleanup, if nothing else just to clarify and clearly document the actual content of this header. Regards, Bjorn
Re: [PATCH v2] vt: Fix a missing-check bug in drivers/tty/vt/vt.c
On Tue, 21 May 2019, Gen Zhang wrote: > In function con_init(), the pointer variable vc_cons[currcons].d, vc and > vc->vc_screenbuf is allocated a memory space via kzalloc(). And they are > used in the following codes. > However, when there is a memory allocation error, kzalloc() can fail. > Thus null pointer (vc_cons[currcons].d, vc and vc->vc_screenbuf) > dereference may happen. And it will cause the kernel to crash. Therefore, > we should check return value and handle the error. > Further,the loop condition MIN_NR_CONSOLES is defined as 1 in > include/uapi/linux/vt.h. So there is no need to unwind the loop. But what if someone changes that define? It won't be obvious that some code did rely on it to be defined to 1. > Signed-off-by: Gen Zhang > > --- > diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c > index fdd12f8..b756609 100644 > --- a/drivers/tty/vt/vt.c > +++ b/drivers/tty/vt/vt.c > @@ -3350,10 +3350,14 @@ static int __init con_init(void) > > for (currcons = 0; currcons < MIN_NR_CONSOLES; currcons++) { > vc_cons[currcons].d = vc = kzalloc(sizeof(struct vc_data), > GFP_NOWAIT); > + if (!vc_cons[currcons].d || !vc) Both vc_cons[currcons].d and vc are assigned the same value on the previous line. You don't have to test them both. > + goto err_vc; > INIT_WORK(_cons[currcons].SAK_work, vc_SAK); > tty_port_init(>port); > visual_init(vc, currcons, 1); > vc->vc_screenbuf = kzalloc(vc->vc_screenbuf_size, GFP_NOWAIT); > + if (!vc->vc_screenbuf) > + goto err_vc_screenbuf; > vc_init(vc, vc->vc_rows, vc->vc_cols, > currcons || !vc->vc_sw->con_save_screen); > } > @@ -3375,6 +3379,14 @@ static int __init con_init(void) > register_console(_console_driver); > #endif > return 0; > +err_vc: > + console_unlock(); > + return -ENOMEM; > +err_vc_screenbuf: > + console_unlock(); > + kfree(vc); > + vc_cons[currcons].d = NULL; > + return -ENOMEM; As soon as you release the lock, another thread could come along and start using the memory pointed by vc_cons[currcons].d you're about to free here. This is unlikely for an initcall, but still. You should consider this ordering instead: err_vc_screenbuf: kfree(vc); vc_cons[currcons].d = NULL; err_vc: console_unlock(); return -ENOMEM; > } > console_initcall(con_init); > > --- >
[PATCH] perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel
When a host system has kernel headers that are newer than a compiling kernel, mksyscalltbl fails with errors such as: : In function 'main': :271:44: error: '__NR_kexec_file_load' undeclared (first use in this function) :271:44: note: each undeclared identifier is reported only once for each function it appears in :272:46: error: '__NR_pidfd_send_signal' undeclared (first use in this function) :273:43: error: '__NR_io_uring_setup' undeclared (first use in this function) :274:43: error: '__NR_io_uring_enter' undeclared (first use in this function) :275:46: error: '__NR_io_uring_register' undeclared (first use in this function) tools/perf/arch/arm64/entry/syscalls//mksyscalltbl: line 48: /tmp/create-table-xvUQdD: Permission denied mksyscalltbl is compiled with default host includes, but run with compiling kernel tree includes, causing some syscall numbers being undeclared. Signed-off-by: Vitaly Chikunov Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Hendrik Brueckner Cc: Ingo Molnar Cc: Jiri Olsa Cc: Kim Phillips Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Ravi Bangoria --- tools/perf/arch/arm64/entry/syscalls/mksyscalltbl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl index c88fd32563eb..459469b7222c 100755 --- a/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl +++ b/tools/perf/arch/arm64/entry/syscalls/mksyscalltbl @@ -56,7 +56,7 @@ create_table() echo "};" } -$gcc -E -dM -x c $input \ +$gcc -E -dM -x c -I $incpath/include/uapi $input \ |sed -ne 's/^#define __NR_//p' \ |sort -t' ' -k2 -nu\ |create_table -- 2.11.0
Re: [PATCH v2] efi_64: Fix a missing-check bug in arch/x86/platform/efi/efi_64.c
On Fri, May 17, 2019 at 11:24:27AM +0200, Ard Biesheuvel wrote: > On Fri, 17 May 2019 at 11:06, Gen Zhang wrote: > > > > On Fri, May 17, 2019 at 10:41:28AM +0200, Ard Biesheuvel wrote: > > > Returning an error here is not going to make much difference, given > > > that the caller of efi_call_phys_prolog() does not bother to check it, > > > and passes the result straight into efi_call_phys_epilog(), which > > > happily attempts to dereference it. > > > > > > So if you want to fix this properly, please fix it at the call site as > > > well. I'd prefer to avoid ERR_PTR() and just return NULL for a failed > > > allocation though. > > Hi Ard, > > Thanks for your timely reply! > > I think returning NULL in efi_call_phys_prolog() and checking in > > efi_call_phys_epilog() is much better. But I am confused what to return > > in efi_call_phys_epilog() if save_pgd is NULL. Definitely not return > > -ENOMEM, because efi_call_phys_epilog() returns unsigned long. Could > > please light on me to fix this problem? > > > If efi_call_phys_prolog() returns NULL, the calling function should > abort and never call efi_call_phys_epilog(). In efi_call_phys_prolog(), save_pgd is allocated by kmalloc_array(). And it is dereferenced in the following codes. However, memory allocation functions such as kmalloc_array() may fail. Dereferencing this save_pgd null pointer may cause the kernel go wrong. Thus we should check this allocation. Further, if efi_call_phys_prolog() returns NULL, we should abort the process in phys_efi_set_virtual_address_map(), and return EFI_ABORTED. Signed-off-by: Gen Zhang --- diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index e1cb01a..a7189a3 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -85,6 +85,8 @@ static efi_status_t __init phys_efi_set_virtual_address_map( pgd_t *save_pgd; save_pgd = efi_call_phys_prolog(); + if (!save_pgd) + return EFI_ABORTED; /* Disable interrupts around EFI calls: */ local_irq_save(flags); diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index cf0347f..828460a 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -91,6 +91,8 @@ pgd_t * __init efi_call_phys_prolog(void) n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE); save_pgd = kmalloc_array(n_pgds, sizeof(*save_pgd), GFP_KERNEL); + if (!save_pgd) + return NULL; /* * Build 1:1 identity mapping for efi=old_map usage. Note that ---
Re: [RFC 0/7] introduce memory hinting API for external process
On Mon, May 20, 2019 at 04:42:00PM +0200, Oleksandr Natalenko wrote: > Hi. > > On Mon, May 20, 2019 at 12:52:47PM +0900, Minchan Kim wrote: > > - Background > > > > The Android terminology used for forking a new process and starting an app > > from scratch is a cold start, while resuming an existing app is a hot start. > > While we continually try to improve the performance of cold starts, hot > > starts will always be significantly less power hungry as well as faster so > > we are trying to make hot start more likely than cold start. > > > > To increase hot start, Android userspace manages the order that apps should > > be killed in a process called ActivityManagerService. ActivityManagerService > > tracks every Android app or service that the user could be interacting with > > at any time and translates that into a ranked list for lmkd(low memory > > killer daemon). They are likely to be killed by lmkd if the system has to > > reclaim memory. In that sense they are similar to entries in any other > > cache. > > Those apps are kept alive for opportunistic performance improvements but > > those performance improvements will vary based on the memory requirements of > > individual workloads. > > > > - Problem > > > > Naturally, cached apps were dominant consumers of memory on the system. > > However, they were not significant consumers of swap even though they are > > good candidate for swap. Under investigation, swapping out only begins > > once the low zone watermark is hit and kswapd wakes up, but the overall > > allocation rate in the system might trip lmkd thresholds and cause a cached > > process to be killed(we measured performance swapping out vs. zapping the > > memory by killing a process. Unsurprisingly, zapping is 10x times faster > > even though we use zram which is much faster than real storage) so kill > > from lmkd will often satisfy the high zone watermark, resulting in very > > few pages actually being moved to swap. > > > > - Approach > > > > The approach we chose was to use a new interface to allow userspace to > > proactively reclaim entire processes by leveraging platform information. > > This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages > > that are known to be cold from userspace and to avoid races with lmkd > > by reclaiming apps as soon as they entered the cached state. Additionally, > > it could provide many chances for platform to use much information to > > optimize memory efficiency. > > > > IMHO we should spell it out that this patchset complements MADV_WONTNEED > > and MADV_FREE by adding non-destructive ways to gain some free memory > > space. MADV_COLD is similar to MADV_WONTNEED in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > immediately; MADV_COOL is similar to MADV_FREE in a way that it hints the > > kernel that memory region is not currently needed and should be reclaimed > > when memory pressure rises. > > > > To achieve the goal, the patchset introduce two new options for madvise. > > One is MADV_COOL which will deactive activated pages and the other is > > MADV_COLD which will reclaim private pages instantly. These new options > > complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to > > gain some free memory space. MADV_COLD is similar to MADV_DONTNEED in a way > > that it hints the kernel that memory region is not currently needed and > > should be reclaimed immediately; MADV_COOL is similar to MADV_FREE in a way > > that it hints the kernel that memory region is not currently needed and > > should be reclaimed when memory pressure rises. > > > > This approach is similar in spirit to madvise(MADV_WONTNEED), but the > > information required to make the reclaim decision is not known to the app. > > Instead, it is known to a centralized userspace daemon, and that daemon > > must be able to initiate reclaim on its own without any app involvement. > > To solve the concern, this patch introduces new syscall - > > > > struct pr_madvise_param { > > int size; > > const struct iovec *vec; > > } > > > > int process_madvise(int pidfd, ssize_t nr_elem, int *behavior, > > struct pr_madvise_param *restuls, > > struct pr_madvise_param *ranges, > > unsigned long flags); > > > > The syscall get pidfd to give hints to external process and provides > > pair of result/ranges vector arguments so that it could give several > > hints to each address range all at once. > > > > I guess others have different ideas about the naming of syscall and options > > so feel free to suggest better naming. > > > > - Experiment > > > > We did bunch of testing with several hundreds of real users, not artificial > > benchmark on android. We saw about 17% cold start decreasement without any > > significant battery/app startup latency issues. And with artificial > >
Re: [RFC 7/7] mm: madvise support MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER
On Mon, May 20, 2019 at 11:28:01AM +0200, Michal Hocko wrote: > [cc linux-api] > > On Mon 20-05-19 12:52:54, Minchan Kim wrote: > > System could have much faster swap device like zRAM. In that case, swapping > > is extremely cheaper than file-IO on the low-end storage. > > In this configuration, userspace could handle different strategy for each > > kinds of vma. IOW, they want to reclaim anonymous pages by MADV_COLD > > while it keeps file-backed pages in inactive LRU by MADV_COOL because > > file IO is more expensive in this case so want to keep them in memory > > until memory pressure happens. > > > > To support such strategy easier, this patch introduces > > MADV_ANONYMOUS_FILTER and MADV_FILE_FILTER options in madvise(2) like > > that /proc//clear_refs already has supported same filters. > > They are filters could be Ored with other existing hints using top two bits > > of (int behavior). > > madvise operates on top of ranges and it is quite trivial to do the > filtering from the userspace so why do we need any additional filtering? > > > Once either of them is set, the hint could affect only the interested vma > > either anonymous or file-backed. > > > > With that, user could call a process_madvise syscall simply with a entire > > range(0x0 - 0x) but either of MADV_ANONYMOUS_FILTER and > > MADV_FILE_FILTER so there is no need to call the syscall range by range. > > OK, so here is the reason you want that. The immediate question is why > cannot the monitor do the filtering from the userspace. Slightly more > work, all right, but less of an API to expose and that itself is a > strong argument against. What I should do if we don't have such filter option is to enumerate all of vma via /proc//maps and then parse every ranges and inode from string, which would be painful for 2000+ vmas. > > > * from v1r2 > > * use consistent check with clear_refs to identify anon/file vma - surenb > > > > * from v1r1 > > * use naming "filter" for new madvise option - dancol > > > > Signed-off-by: Minchan Kim > > --- > > include/uapi/asm-generic/mman-common.h | 5 + > > mm/madvise.c | 14 ++ > > 2 files changed, 19 insertions(+) > > > > diff --git a/include/uapi/asm-generic/mman-common.h > > b/include/uapi/asm-generic/mman-common.h > > index b8e230de84a6..be59a1b90284 100644 > > --- a/include/uapi/asm-generic/mman-common.h > > +++ b/include/uapi/asm-generic/mman-common.h > > @@ -66,6 +66,11 @@ > > #define MADV_WIPEONFORK 18 /* Zero memory on fork, child only */ > > #define MADV_KEEPONFORK 19 /* Undo MADV_WIPEONFORK */ > > > > +#define MADV_BEHAVIOR_MASK (~(MADV_ANONYMOUS_FILTER|MADV_FILE_FILTER)) > > + > > +#define MADV_ANONYMOUS_FILTER (1<<31) /* works for only anonymous vma > > */ > > +#define MADV_FILE_FILTER (1<<30) /* works for only file-backed vma */ > > + > > /* compatibility flags */ > > #define MAP_FILE 0 > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index f4f569dac2bd..116131243540 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -1002,7 +1002,15 @@ static int madvise_core(struct task_struct *tsk, > > unsigned long start, > > int write; > > size_t len; > > struct blk_plug plug; > > + bool anon_only, file_only; > > > > + anon_only = behavior & MADV_ANONYMOUS_FILTER; > > + file_only = behavior & MADV_FILE_FILTER; > > + > > + if (anon_only && file_only) > > + return error; > > + > > + behavior = behavior & MADV_BEHAVIOR_MASK; > > if (!madvise_behavior_valid(behavior)) > > return error; > > > > @@ -1067,12 +1075,18 @@ static int madvise_core(struct task_struct *tsk, > > unsigned long start, > > if (end < tmp) > > tmp = end; > > > > + if (anon_only && vma->vm_file) > > + goto next; > > + if (file_only && !vma->vm_file) > > + goto next; > > + > > /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */ > > error = madvise_vma(tsk, vma, , start, tmp, > > behavior, ); > > if (error) > > goto out; > > *nr_pages += pages; > > +next: > > start = tmp; > > if (prev && start < prev->vm_end) > > start = prev->vm_end; > > -- > > 2.21.0.1020.gf2820cf01a-goog > > > > -- > Michal Hocko > SUSE Labs