On 4/19/21 2:33 PM, Len Brown via Libc-alpha wrote:
the AI guys are super excited about matrix multiplication,
but I have a hard time imagining why grep(1) would find a use for it.
I don't. Matrix multiplication is used in modern string-searching
algorithms that could be useful in running 'gre
On Mon, Apr 19, 2021 at 05:33:03PM -0400, Len Brown wrote:
> For this to happen, every thread would not only have to include/link-with
> code that uses AMX, but that code would have to *run*.
It looks like either I'm not expressing myself clearly enough or you're
not reading my text: the *library*
On Mon, Apr 19, 2021 at 3:15 PM Borislav Petkov wrote:
> All of a sudden you have *every* thread sporting a fat 8K buffer because
> the library decided to use a fat buffer feature for it.
>
> Nope, I don't want that to happen.
For this to happen, every thread would not only have to include/link-
On Mon, Apr 19, 2021 at 02:18:51PM -0400, Len Brown wrote:
> Yes, we could invent a new system call and mandate that it be called
> between #2 and #3. However, we'd still do #4 in response, so I don't
> see any value to that system call.
Lemme refresh your memory: there was a time when the kernel
On Mon, Apr 19, 2021 at 10:15 AM Borislav Petkov wrote:
> > Tasks are created without an 8KB AMX buffer.
> > Tasks have to actually touch the AMX TILE registers for us to allocate
> > one for them.
>
> When tasks do that it doesn't matter too much - for the library it does!
>
> If the library doe
On Fri, Apr 16, 2021 at 06:05:10PM -0400, Len Brown wrote:
> I'm not aware of any intent to transparently use AMX for bcopy, like
> what happened
> with AVX-512. (didn't they undo that mistake?)
No clue, did they?
> Tasks are created without an 8KB AMX buffer.
> Tasks have to actually touch the
On Fri, Apr 16, 2021 at 6:14 PM Andy Lutomirski wrote:
> My point is that every ...
I encourage you to continue to question everything and trust nobody.
While it may cost you a lot in counseling, it is certainly valuable,
at least to me! :-)
I do request, however, that feedback stay specific, s
On Fri, Apr 16, 2021 at 3:11 PM Len Brown wrote:
>
> > I get it. That does not explain why LDMXCSR and VLDMXCSR cause
> > pipelines stalls.
>
> Sorry, I thought this thread was about AMX.
> I don't know the answer to your LDMXCSR and VLDMXCSR question.
My point is that every single major math ex
> I get it. That does not explain why LDMXCSR and VLDMXCSR cause
> pipelines stalls.
Sorry, I thought this thread was about AMX.
I don't know the answer to your LDMXCSR and VLDMXCSR question.
On Thu, Apr 15, 2021 at 1:47 AM Borislav Petkov wrote:
> What I'd like to see is 0-overhead for current use cases and only
> overhead for those who want to use it. If that can't be done
> automagically, then users should request it explicitly. So basically you
> blow up the xsave buffer only for
On Fri, Apr 16, 2021 at 2:54 PM Len Brown wrote:
>
> On Thu, Apr 15, 2021 at 12:24 PM Andy Lutomirski wrote:
> > On Wed, Apr 14, 2021 at 2:48 PM Len Brown wrote:
>
> > > > ... the transition penalty into and out of AMX code
>
> The concept of 'transition' exists between AVX and SSE instructions
On Thu, Apr 15, 2021 at 12:24 PM Andy Lutomirski wrote:
> On Wed, Apr 14, 2021 at 2:48 PM Len Brown wrote:
> > > ... the transition penalty into and out of AMX code
The concept of 'transition' exists between AVX and SSE instructions
because it is possible to mix both instruction sets and touch
> On Apr 15, 2021, at 10:00 AM, Dave Hansen wrote:
>
> On 4/15/21 9:24 AM, Andy Lutomirski wrote:
>> In the patches, *as submitted*, if you trip the XFD #NM *once* and you
>> are the only thread on the system to do so, you will eat the cost of a
>> WRMSR on every subsequent context switch.
>
On 4/15/21 9:24 AM, Andy Lutomirski wrote:
> In the patches, *as submitted*, if you trip the XFD #NM *once* and you
> are the only thread on the system to do so, you will eat the cost of a
> WRMSR on every subsequent context switch.
I think you're saying: If a thread trips XFD #NM *once*, every sw
On Wed, Apr 14, 2021 at 2:48 PM Len Brown wrote:
>
>
> > Then I take the transition penalty into and out of AMX code (I'll
> > believe there is no penalty when I see it -- we've had a penalty with
> > VEX and with AVX-512) and my program runs *slower*.
>
> If you have a clear definition of what "
On Thu, Apr 15, 2021 at 07:29:38AM +0200, Willy Tarreau wrote:
> What Len is saying is that not being interested in a feature is not an
> argument for rejecting its adoption,
Oh, I'm not rejecting its adoption - no, don't mean that.
> which I'm perfectly fine with. But conversely not being intere
On Thu, Apr 15, 2021 at 06:43:43AM +0200, Borislav Petkov wrote:
> On Wed, Apr 14, 2021 at 05:57:22PM -0400, Len Brown wrote:
> > I'm pretty sure that the "it isn't my use case of interest, so it
> > doesn't matter" line of reasoning has long been established as -EINVAL
> > ;-)
>
> I have only a v
On Wed, Apr 14, 2021 at 05:57:22PM -0400, Len Brown wrote:
> I'm pretty sure that the "it isn't my use case of interest, so it
> doesn't matter" line of reasoning has long been established as -EINVAL
> ;-)
I have only a very faint idea what you're trying to say here. Please
explain properly and mo
On Wed, Apr 14, 2021 at 5:58 AM Borislav Petkov wrote:
>
> On Tue, Apr 13, 2021 at 03:51:50PM -0400, Len Brown wrote:
> > AMX does the type of matrix multiplication that AI algorithms use. In
> > the unlikely event that you or one of the libraries you call are doing
> > the same, then you will be
On Tue, Apr 13, 2021 at 6:59 PM Andy Lutomirski wrote:
> Suppose I write some user code and call into a library that uses AMX
> because the library authors benchmarked it and determined that using
> AMX is faster when called in a loop. But I don't call it in a loop.
Again...
AMX registers are
On Wed, Apr 14, 2021 at 12:06:39PM +0200, Willy Tarreau wrote:
> And change jobs :-)
I think by the time that happens, we'll be ready to go to the eternal
vacation. Which means: not my problem.
:-)))
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Apr 14, 2021 at 11:58:04AM +0200, Borislav Petkov wrote:
> On Tue, Apr 13, 2021 at 03:51:50PM -0400, Len Brown wrote:
> > AMX does the type of matrix multiplication that AI algorithms use. In
> > the unlikely event that you or one of the libraries you call are doing
> > the same, then you w
On Tue, Apr 13, 2021 at 03:51:50PM -0400, Len Brown wrote:
> AMX does the type of matrix multiplication that AI algorithms use. In
> the unlikely event that you or one of the libraries you call are doing
> the same, then you will be very happy with AMX. Otherwise, you'll
> probably not use it.
Whi
On Tue, Apr 13, 2021 at 3:47 PM Len Brown wrote:
>
> On Tue, Apr 13, 2021 at 4:16 PM Andy Lutomirski wrote:
> >
> > On Mon, Apr 12, 2021 at 4:46 PM Len Brown wrote:
> > >
> > > On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski wrote:
> > >
> > > > AMX: Multiplying a 4x4 matrix probably looks *gr
On Tue, Apr 13, 2021 at 4:16 PM Andy Lutomirski wrote:
>
> On Mon, Apr 12, 2021 at 4:46 PM Len Brown wrote:
> >
> > On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski wrote:
> >
> > > AMX: Multiplying a 4x4 matrix probably looks *great* in a
> > > microbenchmark. Do it once and you permanently al
On Mon, Apr 12, 2021 at 4:46 PM Len Brown wrote:
>
> On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski wrote:
>
> > AMX: Multiplying a 4x4 matrix probably looks *great* in a
> > microbenchmark. Do it once and you permanently allocate 8kB (is that
> > even a constant? can it grow in newer parts?)
Thanks for sharing your perspective, Willy.
I agree that if your application is so sensitive that you need to hand-code
your own memcmp, then linking with (any) new version of (any) dynamic library
is a risk you must consider carefully.
AMX does the type of matrix multiplication that AI algorithm
On Mon, Apr 12, 2021 at 07:46:06PM -0400, Len Brown wrote:
> On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski wrote:
>
> > AMX: Multiplying a 4x4 matrix probably looks *great* in a
> > microbenchmark. Do it once and you permanently allocate 8kB (is that
> > even a constant? can it grow in newer
On Mon, Apr 12, 2021 at 8:17 PM Thomas Gleixner wrote:
> I'm fine with that as long the kernel has a way to detect that and can
> kill the offending application/library combo with an excplicit
> -SIG_NICE_TRY.
Agreed. The new run-time check for altsigstack overflow is one place
we can do that.
On Mon, Apr 12 2021 at 19:46, Len Brown wrote:
> On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski wrote:
>> Even putting aside all kernel and ABI issues, is it actually a good
>> idea for user libraries to transparently use these new features? I'm
>> not really convinced. I think that serious di
On Mon, Apr 12, 2021 at 11:21 AM Andy Lutomirski wrote:
> AMX: Multiplying a 4x4 matrix probably looks *great* in a
> microbenchmark. Do it once and you permanently allocate 8kB (is that
> even a constant? can it grow in newer parts?), potentially hurts all
> future context switches, and does w
On Sun, Apr 11, 2021, Len Brown wrote:
> On Fri, Apr 9, 2021 at 5:44 PM Andy Lutomirski wrote:
> >
> > On Fri, Apr 9, 2021 at 1:53 PM Len Brown wrote:
> > >
> > > On Wed, Mar 31, 2021 at 6:45 PM Andy Lutomirski wrote:
> > > >
> > > > On Wed, Mar 31, 2021 at 3:28 PM Len Brown wrote:
> > > > > We
On Mon, Apr 12, 2021 at 7:19 AM Florian Weimer wrote:
>
> * Andy Lutomirski:
>
> Maybe we could have done this in 2016 when I reported this for the first
> time. Now it is too late, as more and more software is using
> CPUID-based detection for AVX-512. Our users have been using AVX-512
> hardwa
> On Apr 12, 2021, at 7:38 AM, Florian Weimer wrote:
>
> * Borislav Petkov:
>
>>> On Mon, Apr 12, 2021 at 04:19:29PM +0200, Florian Weimer wrote:
>>> Maybe we could have done this in 2016 when I reported this for the first
>>> time. Now it is too late, as more and more software is using
>>>
On Mon, Apr 12, 2021 at 04:38:15PM +0200, Florian Weimer wrote:
> Yes, that's why we have the XGETBV handshake. I was imprecise. It's
> CPUID + XGETBV of course. Or even AT_HWCAP2 (for FSGSBASE).
Ok, that sounds better. So looking at glibc sources, I see something
like this:
init_cpu_features
* Borislav Petkov:
> On Mon, Apr 12, 2021 at 04:19:29PM +0200, Florian Weimer wrote:
>> Maybe we could have done this in 2016 when I reported this for the first
>> time. Now it is too late, as more and more software is using
>> CPUID-based detection for AVX-512.
>
> So as I said on another mail t
On Mon, Apr 12, 2021 at 04:19:29PM +0200, Florian Weimer wrote:
> Maybe we could have done this in 2016 when I reported this for the first
> time. Now it is too late, as more and more software is using
> CPUID-based detection for AVX-512.
So as I said on another mail today, I don't think a librar
On Sun, Apr 11, 2021 at 03:07:29PM -0400, Len Brown wrote:
> If it is the program, how does it know that the library wants to use
> what instructions?
>
> If it is the library, then you have just changed XCR0 at run-time and
> you expose breakage of the thread library that has computed XSAVE size.
From: Len Brown
> Sent: 11 April 2021 20:07
...
> Granted, if you have an application that is statically linked and run
> on new hardware
> and new OS, it can still fail.
That also includes anything compiled and released as a program
binary that must run on older Linux installations.
Such program
On Fri, Apr 9, 2021 at 5:44 PM Andy Lutomirski wrote:
>
> On Fri, Apr 9, 2021 at 1:53 PM Len Brown wrote:
> >
> > On Wed, Mar 31, 2021 at 6:45 PM Andy Lutomirski wrote:
> > >
> > > On Wed, Mar 31, 2021 at 3:28 PM Len Brown wrote:
> > >
> > > > We added compiler annotation for user-level interru
On Fri, Apr 9, 2021 at 1:53 PM Len Brown wrote:
>
> On Wed, Mar 31, 2021 at 6:45 PM Andy Lutomirski wrote:
> >
> > On Wed, Mar 31, 2021 at 3:28 PM Len Brown wrote:
> >
> > > We added compiler annotation for user-level interrupt handlers.
> > > I'm not aware of it failing, or otherwise being conf
On Wed, Mar 31, 2021 at 6:54 PM Borislav Petkov wrote:
>
> On Wed, Mar 31, 2021 at 06:28:27PM -0400, Len Brown wrote:
> > dynamic XCR0 breaks the installed base, I thought we had established
> > that.
>
> We should do a clear cut and have legacy stuff which has its legacy
> expectations on the XST
On Wed, Mar 31, 2021 at 6:45 PM Andy Lutomirski wrote:
>
> On Wed, Mar 31, 2021 at 3:28 PM Len Brown wrote:
>
> > We added compiler annotation for user-level interrupt handlers.
> > I'm not aware of it failing, or otherwise being confused.
>
> I followed your link and found nothing. Can you elabo
On Wed, Mar 31, 2021 at 06:28:27PM -0400, Len Brown wrote:
> dynamic XCR0 breaks the installed base, I thought we had established
> that.
We should do a clear cut and have legacy stuff which has its legacy
expectations on the XSTATE layout and not touch those at all.
And then all new apps which w
On Wed, Mar 31, 2021 at 3:28 PM Len Brown wrote:
>
> On Wed, Mar 31, 2021 at 12:53 PM Andy Lutomirski wrote:
>
> > But this whole annotation thing will require serious compiler support.
> > We already have problems with compilers inlining functions and getting
> > confused about attributes.
>
>
On Wed, Mar 31, 2021 at 12:53 PM Andy Lutomirski wrote:
> But this whole annotation thing will require serious compiler support.
> We already have problems with compilers inlining functions and getting
> confused about attributes.
We added compiler annotation for user-level interrupt handlers.
On Wed, Mar 31, 2021 at 5:42 PM Robert O'Callahan wrote:
>
> For the record, the benefits of dynamic XCR0 for rr recording
> portability still apply. I guess it'd be useful for CRIU too. We would
> also benefit from anything that incentivizes increased support for
> CPUID faulting.
As previously
For the record, the benefits of dynamic XCR0 for rr recording
portability still apply. I guess it'd be useful for CRIU too. We would
also benefit from anything that incentivizes increased support for
CPUID faulting.
Rob
--
Su ot deraeppa sah dna Rehtaf eht htiw saw hcihw, efil lanrete eht uoy
ot m
> On Mar 31, 2021, at 9:31 AM, Len Brown wrote:
>
> On Tue, Mar 30, 2021 at 6:01 PM David Laight wrote:
>
>>> Can we leave it in live registers? That would be the speed-of-light
>>> signal handler approach. But we'd need to teach the signal handler to
>>> not clobber it. Perhaps that coul
On Tue, Mar 30, 2021 at 6:01 PM David Laight wrote:
> > Can we leave it in live registers? That would be the speed-of-light
> > signal handler approach. But we'd need to teach the signal handler to
> > not clobber it. Perhaps that could be part of the contract that a
> > fast signal handler si
On Fri, Mar 26, 2021 at 04:12:25PM -0700, Andy Lutomirski wrote:
> To detect features and control XCR0, we add some new arch_prctls:
>
> arch_prctl(ARCH_GET_XCR0_SUPPORT, 0, ...);
>
> returns the set of XCR0 bits supported on the current kernel.
>
> arch_prctl(ARCH_GET_XCR0_LAZY_SUPPORT, 0, ...)
From: Len Brown
> Sent: 30 March 2021 21:42
>
> On Tue, Mar 30, 2021 at 4:20 PM Andy Lutomirski wrote:
> >
> >
> > > On Mar 30, 2021, at 12:12 PM, Dave Hansen wrote:
> > >
> > > On 3/30/21 10:56 AM, Len Brown wrote:
> > >> On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski
> > >> wrote:
> >
On Tue, Mar 30, 2021 at 4:20 PM Andy Lutomirski wrote:
>
>
> > On Mar 30, 2021, at 12:12 PM, Dave Hansen wrote:
> >
> > On 3/30/21 10:56 AM, Len Brown wrote:
> >> On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski
> >> wrote:
> On Mar 30, 2021, at 10:01 AM, Len Brown wrote:
> Is it req
> On Mar 30, 2021, at 12:12 PM, Dave Hansen wrote:
>
> On 3/30/21 10:56 AM, Len Brown wrote:
>> On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski wrote:
On Mar 30, 2021, at 10:01 AM, Len Brown wrote:
Is it required (by the "ABI") that a user program has everything
on the stack f
On 3/30/21 10:56 AM, Len Brown wrote:
> On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski wrote:
>>> On Mar 30, 2021, at 10:01 AM, Len Brown wrote:
>>> Is it required (by the "ABI") that a user program has everything
>>> on the stack for user-space XSAVE/XRESTOR to get back
>>> to the state of the
On Tue, Mar 30, 2021 at 1:06 PM Andy Lutomirski wrote:
> > On Mar 30, 2021, at 10:01 AM, Len Brown wrote:
> > Is it required (by the "ABI") that a user program has everything
> > on the stack for user-space XSAVE/XRESTOR to get back
> > to the state of the program just before receiving the sign
> On Mar 30, 2021, at 10:01 AM, Len Brown wrote:
>
> Andy,
>
> I agree, completely, with your description of the challenge,
> thank you for focusing the discussion on that problem statement.
>
> Question:
>
> Is it required (by the "ABI") that a user program has everything
> on the stack f
Andy,
I agree, completely, with your description of the challenge,
thank you for focusing the discussion on that problem statement.
Question:
Is it required (by the "ABI") that a user program has everything
on the stack for user-space XSAVE/XRESTOR to get back
to the state of the program just be
Forgive if this is silly but would it be possible to do something
simliar to rseq where the user can register a set of features for a
program counter region and then on interrupt check that to determine
what needs to be saved?
For example if a user doesn't use any AMX but loads a library that
does
On Mon, Mar 29, 2021 at 3:38 PM Len Brown wrote:
>
> On Mon, Mar 29, 2021 at 2:16 PM Andy Lutomirski wrote:
> >
> Hi Andy,
>
> Can you provide a concise definition of the exact problemI(s) this thread
> is attempting to address?
The AVX-512 state, all by itself, is more than 2048 bytes. Quotin
On Mon, Mar 29, 2021 at 2:16 PM Andy Lutomirski wrote:
>
>
> > On Mar 29, 2021, at 8:47 AM, Len Brown wrote:
> >
> > On Sat, Mar 27, 2021 at 5:58 AM Greg KH wrote:
> >>> On Fri, Mar 26, 2021 at 11:39:18PM -0400, Len Brown wrote:
> >>> Hi Andy,
> >>> Say a mainline links with a math library that
> On Mar 29, 2021, at 8:47 AM, Len Brown wrote:
>
> On Sat, Mar 27, 2021 at 5:58 AM Greg KH wrote:
>>> On Fri, Mar 26, 2021 at 11:39:18PM -0400, Len Brown wrote:
>>> Hi Andy,
>>> Say a mainline links with a math library that uses AMX without the
>>> knowledge of the mainline.
>
> sorry for t
> On Mar 29, 2021, at 9:39 AM, Len Brown wrote:
>
>
>>
>> In particular, the library may use instructions that main() doesn't know
>> exist.
>
> And so I'll ask my question another way.
>
> How is it okay to change the value of XCR0 during the run time of a program?
>
> I submit that it
* Len Brown via Libc-alpha:
>> In particular, the library may use instructions that main() doesn't know
>> exist.
>
> And so I'll ask my question another way.
>
> How is it okay to change the value of XCR0 during the run time of a
> program?
>
> I submit that it is not, and that is a deal-killer
> In particular, the library may use instructions that main() doesn't know
> exist.
And so I'll ask my question another way.
How is it okay to change the value of XCR0 during the run time of a program?
I submit that it is not, and that is a deal-killer for a request/release API.
eg. main() do
On Sat, Mar 27, 2021 at 5:58 AM Greg KH wrote:
>
> On Fri, Mar 26, 2021 at 11:39:18PM -0400, Len Brown wrote:
> > Hi Andy,
> >
> > Say a mainline links with a math library that uses AMX without the
> > knowledge of the mainline.
sorry for the confusion.
mainline = main().
ie. the part of the pr
On 3/27/21 5:53 PM, Thomas Gleixner wrote:
> Making it solely depend on XCR0 and fault if not requested upfront is
> bringing you into the situation that you broke 'legacy code' which
> relied on the CPUID bit and that worked until now which gets you
> in the no-regression trap.
Trying to find the
On Sun, Mar 28, 2021 at 01:53:15AM +0100, Thomas Gleixner wrote:
> Though the little devil in my head tells me, that making AMX support
> depend on the CPUID faulting capability might be not the worst thing.
>
> Then we actually enforce CPUID faulting (finally) on CPUs which support
> it, which wo
Andy,
On Fri, Mar 26 2021 at 16:18, Andy Lutomirski wrote:
> arch_prctl(ARCH_SET_XCR0, xcr0, lazy_states, sigsave_states,
> sigclear_states, 0);
>
> Sets xcr0. All states are preallocated except that states in
> lazy_states may be unallocated in the kernel until used. (Not
> supported at all in
On Fri, Mar 26, 2021 at 11:39:18PM -0400, Len Brown wrote:
> Hi Andy,
>
> Say a mainline links with a math library that uses AMX without the
> knowledge of the mainline.
What does this mean? What happened to the context here?
> Say the mainline is also linked with a userspace threading library
On Fri, Mar 26, 2021 at 11:39:18PM -0400, Len Brown wrote:
> Say a mainline links with a math library that uses AMX without the
> knowledge of the mainline.
What is a "mainline"?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Hi Andy,
Say a mainline links with a math library that uses AMX without the
knowledge of the mainline.
Say the mainline is also linked with a userspace threading library
that thinks it has a concept of XSAVE area size.
Wouldn't the change in XCR0, resulting in XSAVE size change, risk
confusing th
Sigh, cc linux-api, not linux-abi.
On Fri, Mar 26, 2021 at 4:12 PM Andy Lutomirski wrote:
>
> Hi all-
>
> After some discussion on IRC, I have a proposal for a Linux ABI for
> using Intel AMX and other similar features. It works like this:
>
> First, we make XCR0 dynamic. This looks a lot like
Hi all-
After some discussion on IRC, I have a proposal for a Linux ABI for
using Intel AMX and other similar features. It works like this:
First, we make XCR0 dynamic. This looks a lot like Keno's patch but
with a different API, outlined below. Different tasks can have
different XCR0 values.
74 matches
Mail list logo