On Thu, 7 Sep 2023 06:04:09 -0500
Segher Boessenkool wrote:
> On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> > I started to hunt
> > down all the Makefile which add a -Werror but there are a lot and
> > eventually I got bored and gave up.
>
> I have a patch
[ This is FYI only. Documenting what I found with gcc 4.5.1 (but is
fixed in 4.5.4 ]
Part of my test suit is to build the kernel with a compiler before asm
goto was supported (to test jump labels without it).
Recently I noticed that the kernel started to hang when building with
it. For a
On Tue, 13 Aug 2013 07:46:46 -0700
H. Peter Anvin h...@linux.intel.com wrote:
On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
Since we really doesn't want to...
Ow. Can't believe I wrote that.
All your base are belong to us!
-- Steve
On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:
Add short_counter,long_counter and before increment counter before each
jump. That way we will know how many short/long jumps were taken.
That's not trivial at all. The jump is a single location (in an asm
goto() statement) that happens
On Wed, 2013-08-07 at 12:03 -0400, Mathieu Desnoyers wrote:
You might want to try creating a global array of counters (accessible
both from C for printout and assembly for update).
Index the array from assembly using: (2f - 1f)
1:
jmp ...;
2:
And put an atomic increment of
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
Ugh. Why the crazy update_jump_label script stuff?
After playing with the patches again, I now understand why I did that.
It wasn't just for optimization.
Currently the way jump labels work is that we use asm goto() and place a
5 byte
On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
For unconditional jmp that should be pretty safe barring any fundamental
changes to the instruction set, in which case we can enable it as
needed, but for extra robustness it probably should skip prefix bytes.
Would the assembler add
On Tue, 2013-08-06 at 09:19 -0700, H. Peter Anvin wrote:
On 08/06/2013 09:15 AM, Steven Rostedt wrote:
On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
For unconditional jmp that should be pretty safe barring any fundamental
changes to the instruction set, in which case we can
On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote:
So I wonder if this is a ok, let's not bother, it's not worth the
pain issue. 128 bytes of offset is very small, so there probably
aren't all that many cases that would use it.
OK, I'll forward port the original patches for the hell of
On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
Steve, perhaps you could add a mode to your binary rewriting program
that counts the number of 2-byte vs 5-byte jumps found, and if possible
get a breakdown of those per subsystem ?
I actually started doing that, as I was curious to
On Tue, 2013-08-06 at 16:43 -0400, Steven Rostedt wrote:
On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
Steve, perhaps you could add a mode to your binary rewriting program
that counts the number of 2-byte vs 5-byte jumps found, and if possible
get a breakdown of those per
On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote:
[3.387362] short jumps: 106
[3.390277] long jumps: 330
Thus, approximately 25%. Not bad.
Also, where these happen to be is probably even more important than how
many. If all the short jumps happen in slow paths, it's rather
[ sent to both Linux kernel mailing list and to gcc list ]
I was looking at some of the old code I still have marked in my TODO
list, that I never pushed to get mainlined. One of them is to move trace
point logic out of the fast path to get rid of the stress that it
imposes on the icache.
Almost
On Mon, 2013-08-05 at 10:02 -0700, H. Peter Anvin wrote:
if (x) __attibute__((section(.foo))) {
/* do something */
}
One concern I have is how this kind of code would work when embedded
inside a function which already has a section attribute. This could
easily cause really
On Mon, 2013-08-05 at 10:12 -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt rost...@goodmis.org wrote:
First off, we have very few things that are *so* unlikely that they
never get executed. Putting things in a separate section would
actually be really bad.
My
On Mon, 2013-08-05 at 13:55 -0400, Steven Rostedt wrote:
The difference between this and the
section hack I suggested, is that this would use a call/ret when
enabled instead of a jmp/jmp.
I wonder if this is what Kris Kross meant in their song?
/me goes back to work...
-- Steve
On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
On 08/05/2013 10:55 AM, Steven Rostedt wrote:
Well, as tracepoints are being added quite a bit in Linux, my concern is
with the inlined functions that they bring. With jump labels they are
disabled in a very unlikely way
On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:
Of course, it would be good to optimize static_key_false() itself -
right now those static key jumps are always five bytes, and while they
get nopped out, it would still be nice if there was some way to have
just a two-byte nop (turning
On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
Traps nest, that's why there is a stack. (OK, so you don't want to take
the same trap inside the trap handler, but that code should be very
limited.) The trap instruction just becomes very short, but rather
slow, call-return.
On Mon, 2013-08-05 at 11:34 -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
Ugh. I can see the attraction of your section thing for that case, I
just get the feeling that we should be able to do better somehow.
Hmm..
On Mon, 2013-08-05 at 11:51 -0700, H. Peter Anvin wrote:
On 08/05/2013 11:49 AM, Steven Rostedt wrote:
On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
Traps nest, that's why there is a stack. (OK, so you don't want to take
the same trap inside the trap handler, but that code
On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
Steven Rostedt rost...@goodmis.org writes:
Can't you just use -freorder-blocks-and-partition?
Yeah, I'm familiar with this option.
This should already partition unlikely blocks into a
different section. Just a single one of course
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt rost...@goodmis.org wrote:
I had patches that did exactly this:
https://lkml.org/lkml/2012/3/8/461
But it got dropped for some reason. I don't remember why. Maybe because
On Mon, 2013-08-05 at 12:57 -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
mathieu.desnoy...@efficios.com wrote:
I remember that choosing between 2 and 5 bytes nop in the asm goto was
tricky: it had something to do with the fact that gcc doesn't know the
On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:
Another thing that bothers me with Steven's approach is that decoding
jumps generated by the compiler seems fragile IMHO.
The encodings wont change. If they do, then old kernels will not run on
new hardware.
Now if it adds a third
On Mon, 2013-08-05 at 22:26 -0400, Jason Baron wrote:
I think if the 'cold' attribute on the default disabled static_key
branch moved the text completely out-of-line, it would satisfy your
requirement here?
If you like this approach, perhaps we can make something like this work
within
On Tue, 2009-11-24 at 17:12 +, Andrew Haley wrote:
H. Peter Anvin wrote:
If we're changing gcc anyway, then let's add the option of intercepting
the function at the point where the machine state is well-defined by
ABI, which is before the function stack frame is set up.
Hmm. On the
On Fri, 2009-11-20 at 10:57 +0100, Andi Kleen wrote:
Steven Rostedt rost...@goodmis.org writes:
And frame pointers do add a little overhead as well. Too bad the mcount
ABI wasn't something like this:
function:
callmcount
[...]
This way
Ingo, Thomas and Linus,
I know Thomas did a patch to force the -mtune=generic, but just in case
gcc decides to do something crazy again, this patch will catch it.
Should we try to get this in now?
-- Steve
On Fri, 2009-11-20 at 00:23 -0500, Steven Rostedt wrote:
commit
On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
Thomas Gleixner wrote:
We're aligning the stack properly, as per the ABI requirements. Can't
you just fix the tracer?
And how do we do that? The hooks that are in place have no idea of what
happened before they were called?
-- Steve
On Thu, 2009-11-19 at 15:44 +, Andrew Haley wrote:
We're aligning the stack properly, as per the ABI requirements. Can't
you just fix the tracer?
Unfortunately, this is the only fix we have:
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index b416512..cd39064 100644
---
On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:
This modification leads to a hard to solve problem in the kernel
function graph tracer which assumes that the stack looks like:
return address
saved ebp
Umm. But it still does, doesn't it? That
pushl
On Thu, 2009-11-19 at 18:20 +, Andrew Haley wrote:
OK, I found it. There is a struct defined as
struct entry {
...
} __attribute__((__aligned__((1 (4);
and then in timer_stats_update_stats you have a local variable of type
struct entry:
void timer_stats_update_stats()
{
On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
* Linus Torvalds torva...@linux-foundation.org wrote:
Admittedly, anybody who compiles with -pg probably doesn't care deeply
about smaller and more efficient code, since the mcount call overhead
tends to make the thing moot anyway,
On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
function:
call __fentry__
[...]
-- Steve
I would really like this. So that we can forget about other possible
further
On Thu, 2009-11-19 at 11:50 -0800, H. Peter Anvin wrote:
Perhaps we could create another profiler? Instead of calling mcount,
call a new function: __fentry__ or something. Have it activated with
another switch. This could make the performance of the function tracer
even better without all
On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:
Well, other archs use a register to store the return address. But it
would also be easy to do (pseudo arch assembly):
function:
mov lr, (%sp)
add 8, %sp
blr __fentry__
Should be bl
On Thu, 2009-11-19 at 12:36 -0800, Linus Torvalds wrote:
On Thu, 19 Nov 2009, Frederic Weisbecker wrote:
That way the lr would have the current function, and the parent would
still be at 8(%sp)
Yeah right, we need at least such very tiny prologue for
archs that store return
On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
Having said all that, I don't expect to personally be looking at the
problem, given the list of other codegen issues that need to be looked
at (reload in particular), profiling/stack interactions would be around
87 millionth on my list.
be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/urgent-2
Steven Rostedt (1):
tracing/x86: Add check to detect GCC messing with mcount prologue
kernel/trace/Kconfig|1 -
scripts/Makefile.build | 25 +++-
scripts
This touches the Makefile scripts. I forgot to CC kbuild and Sam.
-- Steve
On Fri, 2009-11-20 at 00:23 -0500, Steven Rostedt wrote:
Ingo,
Not sure if this is too much for this late in the -rc game, but it finds
the gcc bug at build time, and we don't need to disable function graph
tracer
41 matches
Mail list logo