Andi Kleen writes:
>> So create two events, one for the PT stuff and one to track the
>> side-band stuff. We have a NOP event for just this purpose.
>
> Ok I guess that could work.
>
> Essentially replace the magic mmap offset with a second fd.
>
> Alex, what do you think?
Yes, that's what I sug
restoring the list.. I really should drop all emails you send off list
into /dev/null.
On Wed, Jan 08, 2014 at 09:28:40AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 10:23:22PM +0100, Andi Kleen wrote:
> > > Yes we very much rely on the FREEZE bits for LBR. PT and LBR being
> > > mutual
On Tue, Jan 07, 2014 at 09:51:45PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 04:42:55PM +0100, Andi Kleen wrote:
> > > Yes; go read this:
> > >
> > > lkml.kernel.org/r/20131219125205.gt3...@twins.programming.kicks-ass.net
> >
> > Hmm, but AFAIK we're not using freeze counters on PMI
On Tue, Jan 07, 2014 at 04:42:55PM +0100, Andi Kleen wrote:
> > Yes; go read this:
> >
> > lkml.kernel.org/r/20131219125205.gt3...@twins.programming.kicks-ass.net
>
> Hmm, but AFAIK we're not using freeze counters on PMI today.
> We just rely on the explicit disabling in the counters through the
> So create two events, one for the PT stuff and one to track the
> side-band stuff. We have a NOP event for just this purpose.
Ok I guess that could work.
Essentially replace the magic mmap offset with a second fd.
Alex, what do you think?
-Andi
--
a...@linux.intel.com -- Speaking for myself
> Also, the PT interrupt doesn't actually need to be an NMI; when the
> proposed S/G implementation would actually work as stated there can be
> plenty room left when we trigger the interrupt.
That's true.
-andi
--
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this li
> Yes; go read this:
>
> lkml.kernel.org/r/20131219125205.gt3...@twins.programming.kicks-ass.net
Hmm, but AFAIK we're not using freeze counters on PMI today.
We just rely on the explicit disabling in the counters through the global
ctrl.
So it should be the same as with any other PMI which also
On Tue, Jan 07, 2014 at 01:52:31AM +0100, Andi Kleen wrote:
> > > Also of course it requires disabling/enabling PT explicitly for
> > > every perf message, which is slow. So you add at least 2*WRMSR cost
> > > (thousands of cycles).
> >
> > That's just dumb, no flush the entire PT buffer into a f
On Mon, Jan 06, 2014 at 03:10:28PM -0800, Andi Kleen wrote:
> > To me it seems very weird that PT is hooked to the same PMI as the
> > normal PMU, it really should have been a different interrupt.
>
> It's in the same STATUS register, so it's cheap to check both.
>
> It shouldn't add any new spur
On Mon, Jan 06, 2014 at 03:10:28PM -0800, Andi Kleen wrote:
> Peter Zijlstra writes:
> > Also, do clarify the other points I asked about. Esp. the non
> > FREEZE_ON_PMI behaviour of the PT PMI is worrying me immensely.
>
> The only reason for hardware freeze is when you have a few entries (like
>
On Tue, Jan 07, 2014 at 01:52:31AM +0100, Andi Kleen wrote:
> > > Also of course it requires disabling/enabling PT explicitly for
> > > every perf message, which is slow. So you add at least 2*WRMSR cost
> > > (thousands of cycles).
> >
> > That's just dumb, no flush the entire PT buffer into a f
> > Also of course it requires disabling/enabling PT explicitly for
> > every perf message, which is slow. So you add at least 2*WRMSR cost
> > (thousands of cycles).
>
> That's just dumb, no flush the entire PT buffer into a few large
> records.
How would that work?
You mean a separate buffer
Peter Zijlstra writes:
Can you please clarify your position on the interleaved buffer?
I still can't see how it is a efficient design.
It's generally true in scather-gather (be it software or hardware)
that each additional SG entry increases the cost. So to make things
efficient you always wan
> I don't think the PT design is broken in any way, it's straight
> forward and simple.
Also, do clarify the other points I asked about. Esp. the non
FREEZE_ON_PMI behaviour of the PT PMI is worrying me immensely.
To me it seems very weird that PT is hooked to the same PMI as the
normal PMU, it
On Mon, Jan 06, 2014 at 01:25:02PM -0800, Andi Kleen wrote:
> Peter Zijlstra writes:
>
> > On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> >> So I'd like to steer away from the ways in which hardware can be broken
> >> and talk about a usable interface, to begin with.
> >
>
Peter Zijlstra writes:
> On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
>> So I'd like to steer away from the ways in which hardware can be broken
>> and talk about a usable interface, to begin with.
>
> Just dump it into the regular one buffer like I outlined.
Just getting
On Thu, Dec 19, 2013 at 04:54:27PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra writes:
>
> > On Thu, Dec 19, 2013 at 12:57:59PM +0100, Peter Zijlstra wrote:
> > So you're basically forced to stop the tracing on PMI anyhow; so your
> > continuous tracing argument goes out the window.
>
> It
On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> So I'd like to steer away from the ways in which hardware can be broken
> and talk about a usable interface, to begin with.
Just dump it into the regular one buffer like I outlined.
That said; we very much need to have at least
On Thu, Dec 19, 2013 at 03:49:42PM +0100, Frederic Weisbecker wrote:
> On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> > Or the interface and implementation of BTS support in the kernel
> > discourage its use and that is why it is so rarely used.
>
> I never heard complains a
Peter Zijlstra writes:
> On Thu, Dec 19, 2013 at 12:57:59PM +0100, Peter Zijlstra wrote:
> So you're basically forced to stop the tracing on PMI anyhow; so your
> continuous tracing argument goes out the window.
It's only stopped inside the PMI handler to set up another buffer, and
is then start
On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> Or the interface and implementation of BTS support in the kernel
> discourage its use and that is why it is so rarely used.
I never heard complains about it. It's a simple dump of from/to address couples.
I just think nobody tak
Ingo Molnar writes:
> * Peter Zijlstra wrote:
>
>> On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
>> > Peter Zijlstra writes:
>> >
>> > > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> > >> Yes and some implementations of PT have the same issue, but
On Thu, Dec 19, 2013 at 12:57:59PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 19, 2013 at 12:28:12PM +0100, Peter Zijlstra wrote:
> > This document you referred me to looks to specify something with a
> > proper s/g implementation; called ToPA. There doesn't appear to be a
> > limit to the linked e
Found more:
"Note that no “freezing” takes place with the ToPA PMI. Thus, packet
generation is not frozen, and the interrupt handler will be traced
(though filtering can prevent this). Further, the setting of
IA32_DEBUGCTL.Freeze_Perfmon_on_PMI is ignored and performance counters
are not frozen
Peter Zijlstra writes:
> On Thu, Dec 19, 2013 at 01:14:09PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra writes:
>>
>> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> >> Peter Zijlstra writes:
>> >> > The thing is; why can't you zero-copy whatever buffer the hard
* Peter Zijlstra wrote:
> On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
> > Peter Zijlstra writes:
> >
> > > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> > >> Yes and some implementations of PT have the same issue, but you can do a
> > >> sufficie
On Thu, Dec 19, 2013 at 12:28:12PM +0100, Peter Zijlstra wrote:
> This document you referred me to looks to specify something with a
> proper s/g implementation; called ToPA. There doesn't appear to be a
> limit to the linked entries and you can specify a size per entry, and I
> don't see anywhere
Peter Zijlstra writes:
> On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra writes:
>>
>> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> >> Yes and some implementations of PT have the same issue, but you can do a
>> >> sufficiently la
On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra writes:
>
> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> >> Yes and some implementations of PT have the same issue, but you can do a
> >> sufficiently large high order allocation and ma
On Thu, Dec 19, 2013 at 01:14:09PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra writes:
>
> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> >> Peter Zijlstra writes:
> >> > The thing is; why can't you zero-copy whatever buffer the hardware
> >> > writes into, into th
Peter Zijlstra writes:
> On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> Yes and some implementations of PT have the same issue, but you can do a
>> sufficiently large high order allocation and map it to userspace and
>> still no copying (or parsing/decoding) in kernel spac
Peter Zijlstra writes:
> On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra writes:
>> > The thing is; why can't you zero-copy whatever buffer the hardware
>> > writes into, into the normal buffer?
>>
>> I'm not sure I understand. You mean, have the buffer spl
On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> Yes and some implementations of PT have the same issue, but you can do a
> sufficiently large high order allocation and map it to userspace and
> still no copying (or parsing/decoding) in kernel space required.
What's sufficient
On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> Peter Zijlstra writes:
> > The thing is; why can't you zero-copy whatever buffer the hardware
> > writes into, into the normal buffer?
>
> I'm not sure I understand. You mean, have the buffer split between perf
> data and trace
Peter Zijlstra writes:
> On Wed, Dec 18, 2013 at 04:22:36PM +0200, Alexander Shishkin wrote:
>> > Still confused, if you cannot copy it into one buffer, then why can you
>> > copy it into a second buffer?
>>
>> It's not copied, hardware writes directly into that second buffer.
>
> Where's the PT
On Wed, Dec 18, 2013 at 04:22:36PM +0200, Alexander Shishkin wrote:
> > Still confused, if you cannot copy it into one buffer, then why can you
> > copy it into a second buffer?
>
> It's not copied, hardware writes directly into that second buffer.
Where's the PT documentation? I can't find it in
Peter Zijlstra writes:
> On Wed, Dec 18, 2013 at 04:01:04PM +0200, Alexander Shishkin wrote:
>> > Why don't you start by explaining _why_ you need a second stream to
>> > begin with?
>>
>> Oh, I'm sure I've explained it earlier ([1], [2])
>
> See, I didn't read 0 because that information gets lo
On Wed, Dec 18, 2013 at 04:01:04PM +0200, Alexander Shishkin wrote:
> > Why don't you start by explaining _why_ you need a second stream to
> > begin with?
>
> Oh, I'm sure I've explained it earlier ([1], [2])
See, I didn't read 0 because that information gets lost and patches
should be self expl
Peter Zijlstra writes:
> On Wed, Dec 18, 2013 at 03:23:41PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra writes:
>>
>> > On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
>> >> Instruction tracing PMUs are capable of recording a log of instruction
>> >> execution flow on
On Wed, Dec 18, 2013 at 03:23:41PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra writes:
>
> > On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
> >> Instruction tracing PMUs are capable of recording a log of instruction
> >> execution flow on a cpu core, which can be useful
Peter Zijlstra writes:
> On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
>> Instruction tracing PMUs are capable of recording a log of instruction
>> execution flow on a cpu core, which can be useful for profiling and crash
>> analysis. This patch adds itrace infrastructure fo
On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
> Instruction tracing PMUs are capable of recording a log of instruction
> execution flow on a cpu core, which can be useful for profiling and crash
> analysis. This patch adds itrace infrastructure for perf events and the
> rest o
Instruction tracing PMUs are capable of recording a log of instruction
execution flow on a cpu core, which can be useful for profiling and crash
analysis. This patch adds itrace infrastructure for perf events and the
rest of the kernel to use.
Since such PMUs can produce copious amounts of trace d
43 matches
Mail list logo