On Tue, 13 Aug 2013 07:46:46 -0700
"H. Peter Anvin" wrote:
> > On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
> >> Since we really doesn't want to...
>
> Ow. Can't believe I wrote that.
>
All your base are belong to us!
-- Steve
> On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
>> Since we really doesn't want to...
Ow. Can't believe I wrote that.
-hpa
On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote:
> On 08/12/2013 09:09 AM, Peter Zijlstra wrote:
> >>
> >> On the majority of architectures, including x86, you cannot simply copy
> >> a piece of code elsewhere and have it still work.
> >
> > I thought we used -fPIC which would allow
On 08/12/2013 09:09 AM, Peter Zijlstra wrote:
>>
>> On the majority of architectures, including x86, you cannot simply copy
>> a piece of code elsewhere and have it still work.
>
> I thought we used -fPIC which would allow just that.
>
Doubly wrong. The kernel is not compiled with -fPIC, nor do
On Mon, Aug 12, 2013 at 09:02:02AM -0700, Andi Kleen wrote:
> "H. Peter Anvin" writes:
>
> > However, I would really like to
> > understand what the value is.
>
> Probably very little. When I last looked at it, the main overhead in
> perf currently seems to be backtraces and the ring buffer, not
On Mon, Aug 12, 2013 at 07:56:10AM -0700, H. Peter Anvin wrote:
> On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
> >
> > I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
> > if-forest functions like perf_prepare_sample() and perf_output_sample().
> >
> > They are of the form:
> >
>
"H. Peter Anvin" writes:
> However, I would really like to
> understand what the value is.
Probably very little. When I last looked at it, the main overhead in
perf currently seems to be backtraces and the ring buffer, not this
code.
-Andi
--
a...@linux.intel.com -- Speaking for myself only
On 08/12/2013 02:17 AM, Peter Zijlstra wrote:
>
> I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
> if-forest functions like perf_prepare_sample() and perf_output_sample().
>
> They are of the form:
>
> void func(obj, args..)
> {
> unsigned long f = ...;
>
> if (f &
On Mon, Aug 05, 2013 at 12:55:15PM -0400, Steven Rostedt wrote:
> [ sent to both Linux kernel mailing list and to gcc list ]
>
Let me hijack this thread for something related...
I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT
if-forest functions like perf_prepare_sample() and per
On Wed, 2013-08-07 at 12:03 -0400, Mathieu Desnoyers wrote:
> You might want to try creating a global array of counters (accessible
> both from C for printout and assembly for update).
>
> Index the array from assembly using: (2f - 1f)
>
> 1:
> jmp ...;
> 2:
>
> And put an atomic incr
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:
>
> > Add short_counter,long_counter and before increment counter before each
> > jump. That way we will know how many short/long jumps were taken.
>
> That's not trivial at all. The jump is a
On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote:
> Add short_counter,long_counter and before increment counter before each
> jump. That way we will know how many short/long jumps were taken.
That's not trivial at all. The jump is a single location (in an asm
goto() statement) that happens
On Tue, Aug 06, 2013 at 08:56:00PM -0400, Steven Rostedt wrote:
> On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote:
>
> > [3.387362] short jumps: 106
> > [3.390277] long jumps: 330
> >
> > Thus, approximately 25%. Not bad.
>
> Also, where these happen to be is probably even more
On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote:
> [3.387362] short jumps: 106
> [3.390277] long jumps: 330
>
> Thus, approximately 25%. Not bad.
Also, where these happen to be is probably even more important than how
many. If all the short jumps happen in slow paths, it's rathe
On Tue, 2013-08-06 at 16:43 -0400, Steven Rostedt wrote:
> On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
>
> > Steve, perhaps you could add a mode to your binary rewriting program
> > that counts the number of 2-byte vs 5-byte jumps found, and if possible
> > get a breakdown of those
On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote:
> Steve, perhaps you could add a mode to your binary rewriting program
> that counts the number of 2-byte vs 5-byte jumps found, and if possible
> get a breakdown of those per subsystem ?
I actually started doing that, as I was curious t
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote:
>
> > So I wonder if this is a "ok, let's not bother, it's not worth the
> > pain" issue. 128 bytes of offset is very small, so there probably
> > aren't all that many cases that would use it.
On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote:
> So I wonder if this is a "ok, let's not bother, it's not worth the
> pain" issue. 128 bytes of offset is very small, so there probably
> aren't all that many cases that would use it.
OK, I'll forward port the original patches for the hell
On Tue, Aug 6, 2013 at 7:19 AM, Steven Rostedt wrote:
>
> After playing with the patches again, I now understand why I did that.
> It wasn't just for optimization.
[explanation snipped]
> Anyway, if you feel that update_jump_label is too complex, I can go the
> "update at early boot" route and s
On 08/06/2013 09:26 AM, Steven Rostedt wrote:
>>
>> No, but if we ever end up doing MPX in the kernel, for example, we would
>> have to put an MPX prefix on the jmp.
>
> Well then we just have to update the rest of the jump label code :-)
>
For MPX in the kernel, this would be a small part of th
On Tue, 2013-08-06 at 09:19 -0700, H. Peter Anvin wrote:
> On 08/06/2013 09:15 AM, Steven Rostedt wrote:
> > On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
> >
> >> For unconditional jmp that should be pretty safe barring any fundamental
> >> changes to the instruction set, in which case
On 08/06/2013 09:15 AM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
>
>> For unconditional jmp that should be pretty safe barring any fundamental
>> changes to the instruction set, in which case we can enable it as
>> needed, but for extra robustness it probabl
On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote:
> For unconditional jmp that should be pretty safe barring any fundamental
> changes to the instruction set, in which case we can enable it as
> needed, but for extra robustness it probably should skip prefix bytes.
Would the assembler add
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
> Ugh. Why the crazy update_jump_label script stuff?
After playing with the patches again, I now understand why I did that.
It wasn't just for optimization.
Currently the way jump labels work is that we use asm goto() and place a
5 byte no
On 08/05/2013 09:14 PM, Mathieu Desnoyers wrote:
>>
>> For unconditional jmp that should be pretty safe barring any fundamental
>> changes to the instruction set, in which case we can enable it as
>> needed, but for extra robustness it probably should skip prefix bytes.
>
> On x86-32, some prefixe
* H. Peter Anvin (h...@linux.intel.com) wrote:
> On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote:
> > * Linus Torvalds (torva...@linux-foundation.org) wrote:
> >> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> >> wrote:
> >>>
> >>> I remember that choosing between 2 and 5 bytes nop in the as
On Mon, 2013-08-05 at 22:26 -0400, Jason Baron wrote:
> I think if the 'cold' attribute on the default disabled static_key
> branch moved the text completely out-of-line, it would satisfy your
> requirement here?
>
> If you like this approach, perhaps we can make something like this work
> wit
On 08/05/2013 04:35 PM, Richard Henderson wrote:
On 08/05/2013 09:57 AM, Jason Baron wrote:
On 08/05/2013 03:40 PM, Marek Polacek wrote:
On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
wrote:
Ugh. I can see the attraction of you
* Steven Rostedt (rost...@goodmis.org) wrote:
> On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:
>
[...]
> > My though is that the code above does not cover all jump encodings that
> > can be generated by past, current and future x86 assemblers.
> >
> > Another way around this issue mi
On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote:
> Another thing that bothers me with Steven's approach is that decoding
> jumps generated by the compiler seems fragile IMHO.
The encodings wont change. If they do, then old kernels will not run on
new hardware.
Now if it adds a third o
On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote:
> * Linus Torvalds (torva...@linux-foundation.org) wrote:
>> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
>> wrote:
>>>
>>> I remember that choosing between 2 and 5 bytes nop in the asm goto was
>>> tricky: it had something to do with the fact
* Linus Torvalds (torva...@linux-foundation.org) wrote:
> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> wrote:
> >
> > I remember that choosing between 2 and 5 bytes nop in the asm goto was
> > tricky: it had something to do with the fact that gcc doesn't know the
> > exact size of each ins
On 08/05/2013 09:57 AM, Jason Baron wrote:
> On 08/05/2013 03:40 PM, Marek Polacek wrote:
>> On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
>>> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
>>> wrote:
Ugh. I can see the attraction of your section thing for that case, I
On 08/05/2013 02:39 PM, Steven Rostedt wrote:
On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:
Of course, it would be good to optimize static_key_false() itself -
right now those static key jumps are always five bytes, and while they
get nopped out, it would still be nice if there was s
On Mon, 2013-08-05 at 12:57 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> wrote:
> >
> > I remember that choosing between 2 and 5 bytes nop in the asm goto was
> > tricky: it had something to do with the fact that gcc doesn't know the
> > exact size of each in
On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
wrote:
>
> I remember that choosing between 2 and 5 bytes nop in the asm goto was
> tricky: it had something to do with the fact that gcc doesn't know the
> exact size of each instructions until further down within compilation
Oh, you can't do it
On 08/05/2013 03:40 PM, Marek Polacek wrote:
On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
wrote:
Ugh. I can see the attraction of your section thing for that case, I
just get the feeling that we should be able to do better some
On Mon, Aug 5, 2013 at 12:40 PM, Marek Polacek wrote:
>
> FWIW, we also support hot/cold attributes for labels, thus e.g.
>
> if (bar ())
> goto A;
> /* ... */
> A: __attribute__((cold))
> /* ... */
>
> I don't know whether that might be useful for what you want or not though...
Steve?
* Linus Torvalds (torva...@linux-foundation.org) wrote:
[...]
> With two-byte jumps, you'd still get the I$ fragmentation (the
> argument generation and the call and the branch back would all be in
> the same code segment as the hot code), but that would be offset by
> the fact that at least the ho
On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
> wrote:
> >
> > Ugh. I can see the attraction of your section thing for that case, I
> > just get the feeling that we should be able to do better somehow.
>
> Hmm.. Quite frankly, St
On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt wrote:
> >
> > I had patches that did exactly this:
> >
> > https://lkml.org/lkml/2012/3/8/461
> >
> > But it got dropped for some reason. I don't remember why. Maybe because
> > of the comp
On Mon, Aug 5, 2013 at 12:16 PM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
>> Steven Rostedt writes:
>>
>> Can't you just use -freorder-blocks-and-partition?
>
> Yeah, I'm familiar with this option.
>
This option works best with FDO. FDOed linux kernel rocks
On Mon, Aug 5, 2013 at 12:04 PM, Andi Kleen wrote:
> Steven Rostedt writes:
>
> Can't you just use -freorder-blocks-and-partition?
>
> This should already partition unlikely blocks into a
> different section. Just a single one of course.
That's horrible. Not because of dwarf problems, but exactl
On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote:
> Steven Rostedt writes:
>
> Can't you just use -freorder-blocks-and-partition?
Yeah, I'm familiar with this option.
>
> This should already partition unlikely blocks into a
> different section. Just a single one of course.
>
> FWIW the dis
On Mon, 2013-08-05 at 11:51 -0700, H. Peter Anvin wrote:
> On 08/05/2013 11:49 AM, Steven Rostedt wrote:
> > On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
> >
> >> Traps nest, that's why there is a stack. (OK, so you don't want to take
> >> the same trap inside the trap handler, but th
Steven Rostedt writes:
Can't you just use -freorder-blocks-and-partition?
This should already partition unlikely blocks into a
different section. Just a single one of course.
FWIW the disadvantage is that multiple code sections tends
to break various older dwarf unwinders, as it needs
dwarf3 la
On Mon, 2013-08-05 at 11:34 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
> wrote:
> >
> > Ugh. I can see the attraction of your section thing for that case, I
> > just get the feeling that we should be able to do better somehow.
>
> Hmm.. Quite frankly, Steven, f
On Mon, Aug 5, 2013 at 11:51 AM, H. Peter Anvin wrote:
>>
>> Also, how would you pass the parameters? Every tracepoint has its own
>> parameters to pass to it. How would a trap know what where to get "prev"
>> and "next"?
>
> How do you do that now?
>
> You have to do an IP lookup to find out what
On 08/05/2013 11:49 AM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
>
>> Traps nest, that's why there is a stack. (OK, so you don't want to take
>> the same trap inside the trap handler, but that code should be very
>> limited.) The trap instruction just beco
On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt wrote:
>
> I had patches that did exactly this:
>
> https://lkml.org/lkml/2012/3/8/461
>
> But it got dropped for some reason. I don't remember why. Maybe because
> of the complexity?
Ugh. Why the crazy update_jump_label script stuff? I'd go "Eww"
On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote:
> Traps nest, that's why there is a stack. (OK, so you don't want to take
> the same trap inside the trap handler, but that code should be very
> limited.) The trap instruction just becomes very short, but rather
> slow, call-return.
>
>
On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote:
> Of course, it would be good to optimize static_key_false() itself -
> right now those static key jumps are always five bytes, and while they
> get nopped out, it would still be nice if there was some way to have
> just a two-byte nop (turn
On 08/05/2013 11:34 AM, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
> wrote:
>>
>> Ugh. I can see the attraction of your section thing for that case, I
>> just get the feeling that we should be able to do better somehow.
>
> Hmm.. Quite frankly, Steven, for your use ca
On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
wrote:
>
> Ugh. I can see the attraction of your section thing for that case, I
> just get the feeling that we should be able to do better somehow.
Hmm.. Quite frankly, Steven, for your use case I think you actually
want the C goto *labels* associat
On 08/05/2013 11:20 AM, Linus Torvalds wrote:
>
> Of course, it would be good to optimize static_key_false() itself -
> right now those static key jumps are always five bytes, and while they
> get nopped out, it would still be nice if there was some way to have
> just a two-byte nop (turning into
On 08/05/2013 11:23 AM, Steven Rostedt wrote:
> On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
>> On 08/05/2013 10:55 AM, Steven Rostedt wrote:
>>>
>>> Well, as tracepoints are being added quite a bit in Linux, my concern is
>>> with the inlined functions that they bring. With jump labels
On Mon, Aug 5, 2013 at 11:20 AM, Linus Torvalds
wrote:
>
> The static_key_false() approach with minimal inlining sounds like a
> much better approach overall.
Sorry, I misunderstood your thing. That's actually what you want that
section thing for, because right now you cannot generate the argumen
On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote:
> On 08/05/2013 10:55 AM, Steven Rostedt wrote:
> >
> > Well, as tracepoints are being added quite a bit in Linux, my concern is
> > with the inlined functions that they bring. With jump labels they are
> > disabled in a very unlikely way (t
On Mon, Aug 5, 2013 at 10:55 AM, Steven Rostedt wrote:
>
> My main concern is with tracepoints. Which on 90% (or more) of systems
> running Linux, is completely off, and basically just dead code, until
> someone wants to see what's happening and enables them.
The static_key_false() approach with
On 08/05/2013 10:55 AM, Steven Rostedt wrote:
>
> Well, as tracepoints are being added quite a bit in Linux, my concern is
> with the inlined functions that they bring. With jump labels they are
> disabled in a very unlikely way (the static_key_false() is a nop to skip
> the code, and is dynamical
On Mon, 2013-08-05 at 13:55 -0400, Steven Rostedt wrote:
> The difference between this and the
> "section" hack I suggested, is that this would use a "call"/"ret" when
> enabled instead of a "jmp"/"jmp".
I wonder if this is what Kris Kross meant in their song?
/me goes back to work...
-- Steve
On Mon, 2013-08-05 at 10:12 -0700, Linus Torvalds wrote:
> On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt wrote:
> First off, we have very few things that are *so* unlikely that they
> never get executed. Putting things in a separate section would
> actually be really bad.
My main concern is wit
On Mon, 2013-08-05 at 10:02 -0700, H. Peter Anvin wrote:
> > if (x) __attibute__((section(".foo"))) {
> > /* do something */
> > }
> >
>
> One concern I have is how this kind of code would work when embedded
> inside a function which already has a section attribute. This could
> easily caus
On Mon, Aug 5, 2013 at 10:12 AM, Linus Torvalds
wrote:
>
> Secondly, you don't want a separate section anyway for any normal
> kernel code, since you want short jumps if possible
Just to clarify: the short jump is important regardless of how
unlikely the code you're jumping is, since even if you'
On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt wrote:
>
> Almost a full year ago, Mathieu suggested something like:
>
> if (unlikely(x)) __attribute__((section(".unlikely"))) {
> ...
> } else __attribute__((section(".likely"))) {
> ...
> }
It's almost certainly a horrible idea.
F
On 08/05/2013 09:55 AM, Steven Rostedt wrote:
>
> Almost a full year ago, Mathieu suggested something like:
>
> if (unlikely(x)) __attribute__((section(".unlikely"))) {
> ...
> } else __attribute__((section(".likely"))) {
> ...
> }
>
> https://lkml.org/lkml/2012/8/9/658
>
> Whic
[ sent to both Linux kernel mailing list and to gcc list ]
I was looking at some of the old code I still have marked in my TODO
list, that I never pushed to get mainlined. One of them is to move trace
point logic out of the fast path to get rid of the stress that it
imposes on the icache.
Almost
67 matches
Mail list logo