Re: problems with memory allocation and the alignment check

2021-02-22 Thread Andrew Pinski
On Mon, Feb 22, 2021 at 1:37 AM Michael J. Baars wrote: > > On Mon, 2021-02-22 at 01:29 -0800, Andrew Pinski wrote: > > On Mon, Feb 22, 2021 at 1:17 AM Michael J. Baars > > wrote: > > > Hi, > > > > > > I just wrote this little program to demo

Re: problems with memory allocation and the alignment check

2021-02-22 Thread Andrew Pinski
On Mon, Feb 22, 2021 at 1:17 AM Michael J. Baars wrote: > > Hi, > > I just wrote this little program to demonstrate a possible flaw in both > malloc and calloc. > > If I allocate a the simplest memory region from main(), one out of three > optimization flags fail. > If I allocate the same

Re: Supporting core-specific instruction sets (e.g. big.LITTLE) with restartable sequences

2018-11-02 Thread Andrew Pinski
On Fri, Nov 2, 2018 at 8:12 AM Mathieu Desnoyers wrote: > > Hi Richard, > > I stumbled on these articles: > > - > https://medium.com/@jadr2ddude/a-big-little-problem-a-tale-of-big-little-gone-wrong-e7778ce744bb > - https://www.mono-project.com/news/2016/09/12/arm64-icache/ > > and discussed them

Re: Supporting core-specific instruction sets (e.g. big.LITTLE) with restartable sequences

2018-11-02 Thread Andrew Pinski
On Fri, Nov 2, 2018 at 8:12 AM Mathieu Desnoyers wrote: > > Hi Richard, > > I stumbled on these articles: > > - > https://medium.com/@jadr2ddude/a-big-little-problem-a-tale-of-big-little-gone-wrong-e7778ce744bb > - https://www.mono-project.com/news/2016/09/12/arm64-icache/ > > and discussed them

Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-03 Thread Andrew Pinski
On Fri, Aug 3, 2018 at 5:58 PM Mikulas Patocka wrote: > > > > On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > > > Whoa, hold on. > > > > Memcpy should never be used on device memory. Period. Memcpy doesn't > > know anything about what size of access is needed for accessing a device. > > >

Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-03 Thread Andrew Pinski
On Fri, Aug 3, 2018 at 5:58 PM Mikulas Patocka wrote: > > > > On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > > > Whoa, hold on. > > > > Memcpy should never be used on device memory. Period. Memcpy doesn't > > know anything about what size of access is needed for accessing a device. > > >

Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-03 Thread Andrew Pinski
is undefined behavior to use device memory with memcpy. Thanks, Andrew Pinski > > I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same > corruption. OpenGL doesn't work (it results in artifacts on the AMD card > and lock-up on the NVidia card), but it's quite expected if even simple > writing to the framebuffer doesn't work. > > Mikulas

Re: framebuffer corruption due to overlapping stp instructions on arm64

2018-08-03 Thread Andrew Pinski
is undefined behavior to use device memory with memcpy. Thanks, Andrew Pinski > > I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same > corruption. OpenGL doesn't work (it results in artifacts on the AMD card > and lock-up on the NVidia card), but it's quite expected if even simple > writing to the framebuffer doesn't work. > > Mikulas

[PATCHv2 1/2] arm64:vdso: Rewrite gettimeofday into C.

2017-05-30 Thread Andrew Pinski
of the pointer; this would work for most cases but could fail in a few. Changes from v1: * Fixed bug in __kernel_clock_getres for checking the pointer argument. * Fix comments to refer to functions in arm64. Signed-off-by: Andrew Pinski <apin...@cavium.com> --- arch/arm64/kernel/vdso/Makefile

[PATCHv2 1/2] arm64:vdso: Rewrite gettimeofday into C.

2017-05-30 Thread Andrew Pinski
of the pointer; this would work for most cases but could fail in a few. Changes from v1: * Fixed bug in __kernel_clock_getres for checking the pointer argument. * Fix comments to refer to functions in arm64. Signed-off-by: Andrew Pinski --- arch/arm64/kernel/vdso/Makefile | 13 +- arch/arm64

[PATCH 2/2] arm64:vdso: Remove ISB from gettimeofday.

2017-05-30 Thread Andrew Pinski
ISB is normally required before mrs CNTVCT if we want the mrs to completed after the loads. In this case it is not. As we are taking the difference and if that difference was going to be negative, we just use the last counter value instead. Signed-off-by: Andrew Pinski <apin...@cavium.

[PATCH 2/2] arm64:vdso: Remove ISB from gettimeofday.

2017-05-30 Thread Andrew Pinski
ISB is normally required before mrs CNTVCT if we want the mrs to completed after the loads. In this case it is not. As we are taking the difference and if that difference was going to be negative, we just use the last counter value instead. Signed-off-by: Andrew Pinski --- arch/arm64/kernel

Re: [PATCH 1/2] arm64:vdso: Rewrite gettimeofday into C.

2017-04-24 Thread Andrew Pinski
On 4/24/2017 8:21 AM, Catalin Marinas wrote: On Sun, Apr 23, 2017 at 04:47:00PM -0700, Andrew Pinski wrote: This allows the compiler to optimize the divide by 1000. And remove the other divide. On ThunderX, gettimeofday improves by 32%. On ThunderX 2, gettimeofday improves by 18

Re: [PATCH 1/2] arm64:vdso: Rewrite gettimeofday into C.

2017-04-24 Thread Andrew Pinski
On 4/24/2017 8:21 AM, Catalin Marinas wrote: On Sun, Apr 23, 2017 at 04:47:00PM -0700, Andrew Pinski wrote: This allows the compiler to optimize the divide by 1000. And remove the other divide. On ThunderX, gettimeofday improves by 32%. On ThunderX 2, gettimeofday improves by 18

[PATCH 1/2] arm64:vdso: Rewrite gettimeofday into C.

2017-04-23 Thread Andrew Pinski
This allows the compiler to optimize the divide by 1000. And remove the other divide. On ThunderX, gettimeofday improves by 32%. On ThunderX 2, gettimeofday improves by 18%. Signed-off-by: Andrew Pinski <apin...@cavium.com> --- arch/arm64/kernel/vdso/Makefile | 13 +- arch/arm64/

[PATCH 1/2] arm64:vdso: Rewrite gettimeofday into C.

2017-04-23 Thread Andrew Pinski
This allows the compiler to optimize the divide by 1000. And remove the other divide. On ThunderX, gettimeofday improves by 32%. On ThunderX 2, gettimeofday improves by 18%. Signed-off-by: Andrew Pinski --- arch/arm64/kernel/vdso/Makefile | 13 +- arch/arm64/kernel/vdso/gettimeofday.S

[PATCH 2/2] arm64:vdso: Remove ISB from gettimeofday.

2017-04-23 Thread Andrew Pinski
ISB is normally required before mrs CNTVCT if we want the mrs to completed after the loads. In this case it is not. As we are taking the difference and if that difference was going to be negative, we just use the last counter value instead. Signed-off-by: Andrew Pinski <apin...@cavium.

[PATCH 2/2] arm64:vdso: Remove ISB from gettimeofday.

2017-04-23 Thread Andrew Pinski
ISB is normally required before mrs CNTVCT if we want the mrs to completed after the loads. In this case it is not. As we are taking the difference and if that difference was going to be negative, we just use the last counter value instead. Signed-off-by: Andrew Pinski --- arch/arm64/kernel

Re: [PATCH v7 00/20] ILP32 for ARM64

2017-02-12 Thread Andrew Pinski
r-0.19% 458.sjeng 0.22% 462.libquantum 0.00% 464.h264ref 11.19% 471.omnetpp11.80% 473.astar -0.29% 483.xalancbmk 8.87% Score 8.12% Thanks, Andrew Pinski > > Changes: >

Re: [PATCH v7 00/20] ILP32 for ARM64

2017-02-12 Thread Andrew Pinski
.19% 458.sjeng 0.22% 462.libquantum 0.00% 464.h264ref 11.19% 471.omnetpp11.80% 473.astar -0.29% 483.xalancbmk 8.87% Score 8.12% Thanks, Andrew Pinski > > Changes: > v3: https://lkml.org/lkml/2014/9/3/704 &

[PATCH] arm64: lib: patch in prfm for copy_template if requested

2017-01-10 Thread Andrew Pinski
On ThunderX T88 pass 1 and pass 2, there is no hardware prefetching so we need to patch in explicit software prefetching instructions. This speeds up copy_to_user and copy_from_user for large size. The main use of large sizes is I/O read/writes. Signed-off-by: Andrew Pinski <apin...@cavium.

[PATCH] arm64: lib: patch in prfm for copy_template if requested

2017-01-10 Thread Andrew Pinski
On ThunderX T88 pass 1 and pass 2, there is no hardware prefetching so we need to patch in explicit software prefetching instructions. This speeds up copy_to_user and copy_from_user for large size. The main use of large sizes is I/O read/writes. Signed-off-by: Andrew Pinski --- arch/arm64/lib

[PATCH] patch in prfm for copy_template if requested

2017-01-10 Thread Andrew Pinski
For user space, we will be using the SIMD registers which allows for not using any callee saved registers and get better performance. So basically this is my old patch which just patches in the prfm to copy_template updated for the new name of the define and for the nop not needed to be there any mor

[PATCH] patch in prfm for copy_template if requested

2017-01-10 Thread Andrew Pinski
For user space, we will be using the SIMD registers which allows for not using any callee saved registers and get better performance. So basically this is my old patch which just patches in the prfm to copy_template updated for the new name of the define and for the nop not needed to be there any mor

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-25 Thread Andrew Pinski
On Fri, Jun 24, 2016 at 4:38 AM, Florian Weimer <fwei...@redhat.com> wrote: > On 06/23/2016 09:56 AM, Andreas Schwab wrote: >> >> Andrew Pinski <pins...@gmail.com> writes: >> >>> So the question becomes do we care enough about the incompatibles >>

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-25 Thread Andrew Pinski
On Fri, Jun 24, 2016 at 4:38 AM, Florian Weimer wrote: > On 06/23/2016 09:56 AM, Andreas Schwab wrote: >> >> Andrew Pinski writes: >> >>> So the question becomes do we care enough about the incompatibles >>> between AARCH32 and AARCH64 to fix this and

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-23 Thread Andrew Pinski
On Thu, Jun 23, 2016 at 12:36 AM, Yury Norov <yno...@caviumnetworks.com> wrote: > On Thu, Jun 23, 2016 at 09:32:46AM +0200, Andreas Schwab wrote: >> Andrew Pinski <pins...@gmail.com> writes: >> >> > So if you want aarch64 to be compatible w

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-23 Thread Andrew Pinski
On Thu, Jun 23, 2016 at 12:36 AM, Yury Norov wrote: > On Thu, Jun 23, 2016 at 09:32:46AM +0200, Andreas Schwab wrote: >> Andrew Pinski writes: >> >> > So if you want aarch64 to be compatible with aarch32, you need to >> > define __WORDSIZE_TIME64_CO

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-23 Thread Andrew Pinski
On Thu, Jun 23, 2016 at 12:32 AM, Andreas Schwab <sch...@suse.de> wrote: > Andrew Pinski <pins...@gmail.com> writes: > >> So if you want aarch64 to be compatible with aarch32, you need to >> define __WORDSIZE_TIME64_COMPAT32. If we don't want aarch64 and >

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-23 Thread Andrew Pinski
On Thu, Jun 23, 2016 at 12:32 AM, Andreas Schwab wrote: > Andrew Pinski writes: > >> So if you want aarch64 to be compatible with aarch32, you need to >> define __WORDSIZE_TIME64_COMPAT32. If we don't want aarch64 and >> aarch32 to be compatible at all, then we can d

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-22 Thread Andrew Pinski
On Wed, Jun 22, 2016 at 9:35 PM, Yury Norov <yno...@caviumnetworks.com> wrote: > On Tue, Jun 21, 2016 at 11:14:54AM +0100, Szabolcs Nagy wrote: >> On 21/06/16 06:06, Yury Norov wrote: >> > From: Andrew Pinski <apin...@cavium.com> >> > >> > NOTE Thi

Re: [PATCH 01/27] [AARCH64] Fix utmp struct for compatibility reasons.

2016-06-22 Thread Andrew Pinski
On Wed, Jun 22, 2016 at 9:35 PM, Yury Norov wrote: > On Tue, Jun 21, 2016 at 11:14:54AM +0100, Szabolcs Nagy wrote: >> On 21/06/16 06:06, Yury Norov wrote: >> > From: Andrew Pinski >> > >> > NOTE This is an ABI change for AARCH64. >> > If y

Re: [PATCH 23/27] [AARCH64] delouse input arguments in system functions

2016-06-21 Thread Andrew Pinski
On Tue, Jun 21, 2016 at 8:42 AM, Arnd Bergmann wrote: > On Tuesday, June 21, 2016 10:36:53 AM CEST Joseph Myers wrote: >> On Tue, 21 Jun 2016, Yury Norov wrote: >> >> > Signed-off-by: Yury Norov >> >> You're missing a patch description. What does

Re: [PATCH 23/27] [AARCH64] delouse input arguments in system functions

2016-06-21 Thread Andrew Pinski
On Tue, Jun 21, 2016 at 8:42 AM, Arnd Bergmann wrote: > On Tuesday, June 21, 2016 10:36:53 AM CEST Joseph Myers wrote: >> On Tue, 21 Jun 2016, Yury Norov wrote: >> >> > Signed-off-by: Yury Norov >> >> You're missing a patch description. What does "delouse" even mean? What >> is the ABI

[PATCH] perf annotate: ARM64 support

2016-06-11 Thread Andrew Pinski
function, thereby properly annotating them. * allows perf to identify function calls, allowing called functions to be followed in the annotated view. Signed-off-by: Andrew Pinski <apin...@cavium.com> --- tools/perf/util/annotate.c | 64 +++--- 1 file c

[PATCH] perf annotate: ARM64 support

2016-06-11 Thread Andrew Pinski
function, thereby properly annotating them. * allows perf to identify function calls, allowing called functions to be followed in the annotated view. Signed-off-by: Andrew Pinski --- tools/perf/util/annotate.c | 64 +++--- 1 file changed, 61 insertions(+), 3

Re: [PATCH 24/25] arm64:ilp32: add vdso-ilp32 and use for signal return

2016-05-05 Thread Andrew Pinski
On Wed, May 4, 2016 at 7:24 PM, Zhangjian (Bamvor) <bamvor.zhangj...@huawei.com> wrote: > Hi, > > > On 2016/5/5 7:23, Andrew Pinski wrote: >> >> On Wed, May 4, 2016 at 2:49 PM, Yury Norov <yno...@caviumnetworks.com> >> wrote: >>> >>>

Re: [PATCH 24/25] arm64:ilp32: add vdso-ilp32 and use for signal return

2016-05-05 Thread Andrew Pinski
On Wed, May 4, 2016 at 7:24 PM, Zhangjian (Bamvor) wrote: > Hi, > > > On 2016/5/5 7:23, Andrew Pinski wrote: >> >> On Wed, May 4, 2016 at 2:49 PM, Yury Norov >> wrote: >>> >>> On Tue, May 03, 2016 at 08:41:25PM +0800, Zhangjian (Bamvor) wrote: &g

Re: [PATCH 24/25] arm64:ilp32: add vdso-ilp32 and use for signal return

2016-05-04 Thread Andrew Pinski
c ZERO(0) ZERO(1) ... ENTRY(__kernel_clock_gettime) .cfi_startproc ZERO(1) ... ENTRY(__kernel_clock_getres) .cfi_startproc ZERO(1) Thanks, Andrew Pinski > > Yury.

Re: [PATCH 24/25] arm64:ilp32: add vdso-ilp32 and use for signal return

2016-05-04 Thread Andrew Pinski
) ... ENTRY(__kernel_clock_gettime) .cfi_startproc ZERO(1) ... ENTRY(__kernel_clock_getres) .cfi_startproc ZERO(1) Thanks, Andrew Pinski > > Yury.

Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results

2016-04-27 Thread Andrew Pinski
On Wed, Apr 27, 2016 at 12:30 AM, Andrew Pinski <pins...@gmail.com> wrote: > On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor) > <bamvor.zhangj...@huawei.com> wrote: >> Hi, Yury >> >> >> On 2016/4/6 6:44, Yury Norov wrote: >>> >>&

Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results

2016-04-27 Thread Andrew Pinski
On Wed, Apr 27, 2016 at 12:30 AM, Andrew Pinski wrote: > On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor) > wrote: >> Hi, Yury >> >> >> On 2016/4/6 6:44, Yury Norov wrote: >>> >>> There are about 20 failing tests of 782 in lite scenario. &g

Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results

2016-04-27 Thread Andrew Pinski
On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor) wrote: > Hi, Yury > > > On 2016/4/6 6:44, Yury Norov wrote: >> >> There are about 20 failing tests of 782 in lite scenario. >> float_bessel >> float_exp_log >> float_iperb >> float_power >> float_trigo >> pipeio_1 >>

Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64 - LTP results

2016-04-27 Thread Andrew Pinski
On Fri, Apr 22, 2016 at 8:37 PM, Zhangjian (Bamvor) wrote: > Hi, Yury > > > On 2016/4/6 6:44, Yury Norov wrote: >> >> There are about 20 failing tests of 782 in lite scenario. >> float_bessel >> float_exp_log >> float_iperb >> float_power >> float_trigo >> pipeio_1 >> pipeio_3 >> pipeio_5 >>

Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64

2016-04-07 Thread Andrew Pinski
On Thu, Apr 7, 2016 at 5:18 AM, Adam Borowski wrote: > On Wed, 6 Apr 2016, Geert Uytterhoeven wrote: >> On Wed, Apr 6, 2016 at 12:08 AM, Yury Norov >> wrote: >>> v6: >>> - time_t, __kenel_off_t and other types turned to be 32-bit >>>for

Re: [RFC6 PATCH v6 00/21] ILP32 for ARM64

2016-04-07 Thread Andrew Pinski
On Thu, Apr 7, 2016 at 5:18 AM, Adam Borowski wrote: > On Wed, 6 Apr 2016, Geert Uytterhoeven wrote: >> On Wed, Apr 6, 2016 at 12:08 AM, Yury Norov >> wrote: >>> v6: >>> - time_t, __kenel_off_t and other types turned to be 32-bit >>>for compatibility reasons (after v5 discussion); > >

Re: [PATCH] Revert "arm64: Increase the max granular size"

2016-03-19 Thread Andrew Pinski
On 3/17/2016 7:27 AM, Catalin Marinas wrote: On Wed, Mar 16, 2016 at 10:26:08AM -0500, Timur Tabi wrote: Catalin Marinas wrote: Why do you need your own defconfig? If it's just on the short term until all your code is upstream, that's fine, but this goes against the single Image aim. I would

Re: [PATCH] Revert "arm64: Increase the max granular size"

2016-03-19 Thread Andrew Pinski
On 3/17/2016 7:27 AM, Catalin Marinas wrote: On Wed, Mar 16, 2016 at 10:26:08AM -0500, Timur Tabi wrote: Catalin Marinas wrote: Why do you need your own defconfig? If it's just on the short term until all your code is upstream, that's fine, but this goes against the single Image aim. I would

[PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, don't use udiv

2016-03-13 Thread Andrew Pinski
. On ThunderX, this speeds up gettimeofday by 16.6%. Signed-off-by: Andrew Pinski <apin...@cavium.com> --- arch/arm64/kernel/vdso/gettimeofday.S | 27 +++ 1 files changed, 19 insertions(+), 8 deletions(-) diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kerne

[PATCH 2/2] ARM64:VDSO: Improve __do_get_tspec, don't use udiv

2016-03-13 Thread Andrew Pinski
. On ThunderX, this speeds up gettimeofday by 16.6%. Signed-off-by: Andrew Pinski --- arch/arm64/kernel/vdso/gettimeofday.S | 27 +++ 1 files changed, 19 insertions(+), 8 deletions(-) diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S index

[PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv

2016-03-13 Thread Andrew Pinski
On many cores, udiv with a large value is slow, expand instead the division out to be what GCC would have generated for the divide by 1000. On ThunderX, the speeds up gettimeofday by 5%. Signed-off-by: Andrew Pinski <apin...@cavium.com> --- arch/arm64/kernel/vdso/gettimeofday.S

[PATCH 0/2] *** SUBJECT HERE ***

2016-03-13 Thread Andrew Pinski
*** BLURB HERE *** Andrew Pinski (2): ARM64:VDSO: Improve gettimeofday, don't use udiv ARM64:VDSO: Improve __do_get_tspec, don't use udiv arch/arm64/kernel/vdso/gettimeofday.S | 47 1 files changed, 35 insertions(+), 12 deletions(-) -- 1.7.2.5

[PATCH 1/2] ARM64:VDSO: Improve gettimeofday, don't use udiv

2016-03-13 Thread Andrew Pinski
On many cores, udiv with a large value is slow, expand instead the division out to be what GCC would have generated for the divide by 1000. On ThunderX, the speeds up gettimeofday by 5%. Signed-off-by: Andrew Pinski --- arch/arm64/kernel/vdso/gettimeofday.S | 20 1 files

[PATCH 0/2] *** SUBJECT HERE ***

2016-03-13 Thread Andrew Pinski
*** BLURB HERE *** Andrew Pinski (2): ARM64:VDSO: Improve gettimeofday, don't use udiv ARM64:VDSO: Improve __do_get_tspec, don't use udiv arch/arm64/kernel/vdso/gettimeofday.S | 47 1 files changed, 35 insertions(+), 12 deletions(-) -- 1.7.2.5

Re: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.

2015-12-22 Thread Andrew Pinski
On Tue, Dec 21, 2015 at 5:43 AM, Arnd Bergmann wrote: > > On Monday 21 December 2015, Will Deacon wrote: >> On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote: >> > Adding a check for the cache line size is not much overhead. >> > Special

Re: [PATCH] ARM64: Improve copy_page for 128 cache line sizes.

2015-12-22 Thread Andrew Pinski
On Tue, Dec 21, 2015 at 5:43 AM, Arnd Bergmann <a...@arndb.de> wrote: > > On Monday 21 December 2015, Will Deacon wrote: >> On Sat, Dec 19, 2015 at 04:11:18PM -0800, Andrew Pinski wrote: >> > Adding a check for the cache line size is not much overhead. >> > S

[PATCH] ARM64: Improve copy_page for 128 cache line sizes.

2015-12-19 Thread Andrew Pinski
Adding a check for the cache line size is not much overhead. Special case 128 byte cache line size. This improves copy_page by 85% on ThunderX compared to the original implementation. For LMBench, it improves between 4-10%. Signed-off-by: Andrew Pinski --- arch/arm64/lib/copy_page.S | 39

[PATCH] ARM64: Improve copy_page for 128 cache line sizes.

2015-12-19 Thread Andrew Pinski
Adding a check for the cache line size is not much overhead. Special case 128 byte cache line size. This improves copy_page by 85% on ThunderX compared to the original implementation. For LMBench, it improves between 4-10%. Signed-off-by: Andrew Pinski <apin...@cavium.com> --- arch/arm

[PATCH] ARM64: Fix compiling with GCC 6 and Atomics enabled

2015-12-18 Thread Andrew Pinski
The problem here is that GCC 6 and above emits .arch now for each function so now the global .arch_extension has no effect. This fixes the problem by putting .arch_extension inside ARM64_LSE_ATOMIC_INSN so it is enabled for each place where LSE is used. Signed-off-by: Andrew Pinski --- arch

[PATCH] ARM64: Fix compiling with GCC 6 and Atomics enabled

2015-12-18 Thread Andrew Pinski
The problem here is that GCC 6 and above emits .arch now for each function so now the global .arch_extension has no effect. This fixes the problem by putting .arch_extension inside ARM64_LSE_ATOMIC_INSN so it is enabled for each place where LSE is used. Signed-off-by: Andrew Pinski <a

Re: [PATCH v6 12/20] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2015-12-17 Thread Andrew Pinski
On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann wrote: > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote: >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > >> > +#define compat_sys_lookup_dcookie sys_lookup_dcookie >> > +#define compat_sys_pread64

Re: [PATCH v6 12/20] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2015-12-17 Thread Andrew Pinski
On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann wrote: > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote: >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > >> > +#define compat_sys_lookup_dcookie sys_lookup_dcookie >> > +#define compat_sys_pread64

Re: [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks

2015-12-15 Thread Andrew Pinski
ror=implicit-function-declaration] VM_WARN_ONCE(!pte_young(pte), ^ Thanks, Andrew Pinski > > Signed-off-by: Catalin Marinas > Cc: Will Deacon > --- > arch/arm64/include/asm/pgtable.h | 11 +++ > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --gi

Re: [PATCH 1/2] arm64: Improve error reporting on set_pte_at() checks

2015-12-15 Thread Andrew Pinski
ration of function ‘BUILD_BUG_ON_INVALID’ [-Werror=implicit-function-declaration] VM_WARN_ONCE(!pte_young(pte), ^ Thanks, Andrew Pinski > > Signed-off-by: Catalin Marinas <catalin.mari...@arm.com> > Cc: Will Deacon <will.dea...@arm.com> > --- > arch/arm64/include/asm/pgtable.h | 11

Re: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX)

2015-12-11 Thread Andrew Pinski
ows > the lockup, which really translates to c55a6ffa62. Yes as mutex_optimistic_spin calls into osq_lock/osq_unlock. And 81a43adae3b9 changed mutex.c which David thought was where the issue was located rather than not what mutex_optimistic_spin called. Thanks, Andrew Pinski > > Th

Re: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX)

2015-12-11 Thread Andrew Pinski
ows > the lockup, which really translates to c55a6ffa62. Yes as mutex_optimistic_spin calls into osq_lock/osq_unlock. And 81a43adae3b9 changed mutex.c which David thought was where the issue was located rather than not what mutex_optimistic_spin called. Thanks, Andrew Pinski > > Th

Re: FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX)

2015-12-10 Thread Andrew Pinski
On Thu, Dec 10, 2015 at 7:29 PM, Andrew Pinski wrote: > On Thu, Dec 10, 2015 at 11:44 AM, David Danny wrote: >> >> Hi, >> >> We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is >> an arm64 implementation. > > I get

Re: FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX)

2015-12-10 Thread Andrew Pinski
On Thu, Dec 10, 2015 at 11:44 AM, David Danny wrote: > > Hi, > > We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is > an arm64 implementation. I get a slightly different OOPs and reverting c55a6ffa6285e29f874ed403979472631ec70bff I was able to boot. What I saw with

Re: FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX)

2015-12-10 Thread Andrew Pinski
On Thu, Dec 10, 2015 at 11:44 AM, David Danny wrote: > > Hi, > > We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is > an arm64 implementation. I get a slightly different OOPs and reverting c55a6ffa6285e29f874ed403979472631ec70bff I was able to boot. What I saw with

Re: FW: Commit 81a43adae3b9 (locking/mutex: Use acquire/release semantics) causing failures on arm64 (ThunderX)

2015-12-10 Thread Andrew Pinski
On Thu, Dec 10, 2015 at 7:29 PM, Andrew Pinski <pins...@gmail.com> wrote: > On Thu, Dec 10, 2015 at 11:44 AM, David Danny wrote: >> >> Hi, >> >> We are getting soft lockup OOPs on Cavium CN88XX (A.K.A. ThunderX), which is >> an arm64 implementati

Re: [PATCH v6 14/19] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2015-11-30 Thread Andrew Pinski
ember 2015 00:16:54 Yury Norov wrote: >> > > > From: Andrew Pinski >> > > > >> > > > Add a separate syscall-table for ILP32, which dispatches either to >> > > > native >> > > > LP64 system call implementation or

Re: [PATCH v6 14/19] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2015-11-30 Thread Andrew Pinski
On Wednesday 18 November 2015 00:16:54 Yury Norov wrote: >> > > > From: Andrew Pinski <apin...@cavium.com> >> > > > >> > > > Add a separate syscall-table for ILP32, which dispatches either to >> > > > native >> > >

Re: [PATCH v6 13/17] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2015-11-13 Thread Andrew Pinski
On Fri, Nov 13, 2015 at 7:34 AM, Arnd Bergmann wrote: > On Thursday 12 November 2015 14:47:18 Andreas Schwab wrote: >> Arnd Bergmann writes: >> >> > On Thursday 12 November 2015 10:44:55 Andreas Schwab wrote: >> >> Arnd Bergmann writes: >> >> >> >> > What do you mean with 32-bit off_t? >> >> >>

Re: [PATCH v6 13/17] arm64:ilp32: add sys_ilp32.c and a separate table (in entry.S) to use it

2015-11-13 Thread Andrew Pinski
On Fri, Nov 13, 2015 at 7:34 AM, Arnd Bergmann wrote: > On Thursday 12 November 2015 14:47:18 Andreas Schwab wrote: >> Arnd Bergmann writes: >> >> > On Thursday 12 November 2015 10:44:55 Andreas Schwab wrote: >> >> Arnd Bergmann writes: >> >> >> >> >

Re: [RFC PATCH v6 00/17] ILP32 for ARM64

2015-11-05 Thread Andrew Pinski
On Thu, Nov 5, 2015 at 7:36 PM, Andreas Schwab wrote: > Yury Norov writes: > >> v6: >> - time_t, __kenel_off_t and other types turned to be 32-bit >>for compatibility reasons (after v5 discussion); > > Are the updated glibc patches available somewhere? Not in an useful form right now but i

Re: [RFC PATCH v6 00/17] ILP32 for ARM64

2015-11-05 Thread Andrew Pinski
On Thu, Nov 5, 2015 at 7:36 PM, Andreas Schwab wrote: > Yury Norov writes: > >> v6: >> - time_t, __kenel_off_t and other types turned to be 32-bit >>for compatibility reasons (after v5 discussion); > > Are the updated glibc patches available

Re: [PATCHv2] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-09-02 Thread Andrew Pinski
gt; pins...@gmail.com wrote: >> > >> >>> On Sep 2, 2015, at 1:30 AM, Mark Rutland wrote: >> >>> >> >>> [...] >> >>> >> >>>>>>> On Sat, Aug 29, 2015 at 07:46:22PM +0100, Andrew Pinski wrote: >>

Re: [PATCHv2] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-09-02 Thread Andrew Pinski
siarhei.siamas...@gmail.com> wrote: >> > >> > On Wed, 2 Sep 2015 01:58:56 +0800 >> > pins...@gmail.com wrote: >> > >> >>> On Sep 2, 2015, at 1:30 AM, Mark Rutland <mark.rutl...@arm.com> wrote: >> >>> >> >>>

[PATCHv2] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-08-29 Thread Andrew Pinski
/358995.html. It allows for faster access to midr_el1 than going through a trap and does not exist if the set of cores are not the same. Changes from v1: Forgot to include the auxvec.h part. Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/cpu.h |1 + arch/arm64/include/asm/elf.h

[PATCH] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-08-29 Thread Andrew Pinski
/358995.html. It allows for faster access to midr_el1 than going through a trap and does not exist if the set of cores are not the same. Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/cpu.h |1 + arch/arm64/include/asm/elf.h |6 ++ arch/arm64/kernel/cpuinfo.c | 22

[PATCH] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-08-29 Thread Andrew Pinski
/358995.html. It allows for faster access to midr_el1 than going through a trap and does not exist if the set of cores are not the same. Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/cpu.h |1 + arch/arm64/include/asm/elf.h |6 ++ arch/arm64/kernel/cpuinfo.c | 22

[PATCH] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-08-29 Thread Andrew Pinski
/358995.html. It allows for faster access to midr_el1 than going through a trap and does not exist if the set of cores are not the same. Signed-off-by: Andrew Pinski apin...@cavium.com --- arch/arm64/include/asm/cpu.h |1 + arch/arm64/include/asm/elf.h |6 ++ arch/arm64/kernel/cpuinfo.c

[PATCHv2] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-08-29 Thread Andrew Pinski
/358995.html. It allows for faster access to midr_el1 than going through a trap and does not exist if the set of cores are not the same. Changes from v1: Forgot to include the auxvec.h part. Signed-off-by: Andrew Pinski apin...@cavium.com --- arch/arm64/include/asm/cpu.h |1 + arch/arm64

[PATCH] ARM64: Add AT_ARM64_MIDR to the aux vector

2015-08-29 Thread Andrew Pinski
/358995.html. It allows for faster access to midr_el1 than going through a trap and does not exist if the set of cores are not the same. Signed-off-by: Andrew Pinski apin...@cavium.com --- arch/arm64/include/asm/cpu.h |1 + arch/arm64/include/asm/elf.h |6 ++ arch/arm64/kernel/cpuinfo.c

Re: [PATCH resend] MIPS: Allow FPU emulator to use non-stack area.

2014-10-06 Thread Andrew Pinski
On Mon, Oct 6, 2014 at 5:21 PM, Rich Felker wrote: > On Mon, Oct 06, 2014 at 05:11:38PM -0700, Andrew Pinski wrote: >> On Mon, Oct 6, 2014 at 5:05 PM, Rich Felker wrote: >> > On Mon, Oct 06, 2014 at 04:48:52PM -0700, David Daney wrote: >> >> On 10/06/2014 0

Re: [PATCH resend] MIPS: Allow FPU emulator to use non-stack area.

2014-10-06 Thread Andrew Pinski
On Mon, Oct 6, 2014 at 5:05 PM, Rich Felker wrote: > On Mon, Oct 06, 2014 at 04:48:52PM -0700, David Daney wrote: >> On 10/06/2014 04:38 PM, Andy Lutomirski wrote: >> >On 10/06/2014 02:58 PM, Rich Felker wrote: >> >>On Mon, Oct 06, 2014 at 02:45:29PM -0700, David Daney wrote: >> [...] >> >>This

Re: [PATCH resend] MIPS: Allow FPU emulator to use non-stack area.

2014-10-06 Thread Andrew Pinski
On Mon, Oct 6, 2014 at 5:05 PM, Rich Felker dal...@libc.org wrote: On Mon, Oct 06, 2014 at 04:48:52PM -0700, David Daney wrote: On 10/06/2014 04:38 PM, Andy Lutomirski wrote: On 10/06/2014 02:58 PM, Rich Felker wrote: On Mon, Oct 06, 2014 at 02:45:29PM -0700, David Daney wrote: [...] This is

Re: [PATCH resend] MIPS: Allow FPU emulator to use non-stack area.

2014-10-06 Thread Andrew Pinski
On Mon, Oct 6, 2014 at 5:21 PM, Rich Felker dal...@libc.org wrote: On Mon, Oct 06, 2014 at 05:11:38PM -0700, Andrew Pinski wrote: On Mon, Oct 6, 2014 at 5:05 PM, Rich Felker dal...@libc.org wrote: On Mon, Oct 06, 2014 at 04:48:52PM -0700, David Daney wrote: On 10/06/2014 04:38 PM, Andy

[PATCH 17/24] ARM64: Add loading of ILP32 binaries

2014-09-03 Thread Andrew Pinski
Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/elf.h | 59 +++-- 1 files changed, 50 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 795dc9f..52083cd 100644 --- a/arch/arm64/include/asm

[PATCH 12/24] ARM64:ILP32: COMPAT_USE_64BIT_TIME is true for ILP32 tasks

2014-09-03 Thread Andrew Pinski
due to AARCH32 requiring 4k pages). Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/compat.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h index 9082b27..eca6eec 100644 --- a/arch/arm64/include

[PATCH 10/24] ARM64: Introduce is_a32_task/is_a32_thread and TIF_AARCH32 and use them in the correct locations

2014-09-03 Thread Andrew Pinski
This patch introduces is_a32_compat_task and is_a32_thread so it is easier to say this is a a32 specific thread or a generic compat thread/task. Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/compat.h | 21 + arch/arm64/include/asm/elf.h | 12

[PATCH 03/24] ARM64: Change some CONFIG_COMPAT over to use CONFIG_AARCH32_EL0 instead

2014-09-03 Thread Andrew Pinski
This patch changes CONFIG_COMPAT checks inside the arm64 which are AARCH32 specific Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/arch_timer.h |2 +- arch/arm64/include/asm/elf.h| 20 +--- arch/arm64/include/asm/fpsimd.h |2 +- arch/arm64/include

[PATCH 21/24] ARM64:ILP32: Use a seperate syscall table as a few syscalls need to be using the compat syscalls

2014-09-03 Thread Andrew Pinski
Some syscalls are still need to use the compat versions. So we need to have a seperate syscall table for ILP32. This patch adds them including documentation on why we need to use each one. This list is based on the list from https://lkml.org/lkml/2013/9/11/478. Signed-off-by: Andrew Pinski

[PATCH 15/24] compat_binfmt_elf: coredump: Allow some core dump macros be overridden for compat.

2014-09-03 Thread Andrew Pinski
On some targets (x86 [32bit and x32] and arm64 [aarch32 and ilp32]), there are two compat elf abis. This adds a few more "#define * COMPAT_*" for compat targets to define if needed. Signed-off-by: Andrew Pinski --- fs/compat_binfmt_elf.c | 17 + 1 files changed, 17

[PATCH 16/24] ARM64:ILP32: Support core dump for ILP32

2014-09-03 Thread Andrew Pinski
This patch supports core dumping on ILP32. We need a few extra macros (COMPAT_PR_REG_SIZE and COMPAT_PRSTATUS_SIZE) due to size differences of the register sets. Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/elf.h | 23 +-- arch/arm64/kernel/ptrace.c | 12

[PATCH 20/24] ARM64:ILP32: The native siginfo is used instead of the compat siginfo

2014-09-03 Thread Andrew Pinski
Set COMPAT_USE_NATIVE_SIGINFO to be true for non AARCH32 tasks. Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/compat.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h index 2f84d2c..87cb50d

[PATCH 18/24] ARM64: Add vdso for ILP32 and use it for the signal return

2014-09-03 Thread Andrew Pinski
This patch adds the VDSO for ILP32. We need to use a different VDSO than LP64 since ILP32 uses ELF32 while LP64 uses ELF64. After this patch, signal handling works mostly. In that signals go through their action and then returned correctly. Signed-off-by: Andrew Pinski --- arch/arm64/include

[PATCH 14/24] ARM64:ILP32 use the standard start_thread for ILP32 so the processor state is not AARCH32

2014-09-03 Thread Andrew Pinski
If we have both ILP32 and AARCH32 compiled in, we need use the non compat start thread for ILP32. Signed-off-by: Andrew Pinski --- arch/arm64/include/asm/processor.h | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/arch/arm64/include/asm/processor.h b/arch

[PATCH 24/24] Add documentation about ARM64 ILP32 ABI

2014-09-03 Thread Andrew Pinski
This adds the documentation about the ILP32 ABI and what is difference between it and the normal generic 32bit ABI. Signed-off-by: Andrew Pinski --- Documentation/arm64/ilp32.txt | 57 + 1 files changed, 57 insertions(+), 0 deletions(-) create mode

[PATCH 19/24] ptrace: Allow compat to use the native siginfo

2014-09-03 Thread Andrew Pinski
With ARM64 ILP32 ABI, we want to use the non-compat siginfo as we want to simplify signal handling for this new ABI. This patch just adds a new define COMPAT_USE_NATIVE_SIGINFO and if it is true then read/write in the compat case as it was the non-compat case. Signed-off-by: Andrew Pinski

  1   2   3   >