On Sat, Feb 10, 2018 at 7:26 AM, David Laight wrote:
>
> The alignment doesn't matter, 'rep movsl' will still work.
.. no it won't. It might not copy the last two bytes or whatever,
because the shift of the count will have ignored the low bits.
But since an unaligned stack pointer really shouldn
From: Linus Torvalds
> Sent: 09 February 2018 19:49
...
> I think the instruction scheduling ends up basically breaking around
> microcoded instructions, which is why you'll get something like 12+n
> cycles for "rep movs" on some uarchs, but at that point it's probably
> mostly in the noise compare
From: Denys Vlasenko
> Sent: 09 February 2018 17:17
> On 02/09/2018 06:05 PM, Linus Torvalds wrote:
> > On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel wrote:
> >> +
> >> + /* Copy over the stack-frame */
> >> + cld
> >> + rep movsb
> >
> > Ugh. This is going to be horrendous. Maybe
On Fri, Feb 9, 2018 at 11:25 AM, Joerg Roedel wrote:
>
> Ugh, okay. So I switch to movsl, that should at least perform on-par
> with the chain of 'pushl' instructions I had before.
It should generally be roughly in the same ballpark.
I think the instruction scheduling ends up basically breaking
On 02/09/2018 08:02 PM, Joerg Roedel wrote:
On Fri, Feb 09, 2018 at 09:05:02AM -0800, Linus Torvalds wrote:
On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel wrote:
+
+ /* Copy over the stack-frame */
+ cld
+ rep movsb
Ugh. This is going to be horrendous. Maybe not noticeable on
On Fri, Feb 09, 2018 at 11:17:35AM -0800, Linus Torvalds wrote:
> Yeah, it's only true on the very latest uarchs, and even there it's
> not perfect for small copies.
>
> On the older machines that are relevant for 32-bit code, it's often
> tens of cycles just for the ucode overhead, I think, and "
On Fri, Feb 9, 2018 at 11:02 AM, Joerg Roedel wrote:
>
> Okay, I used movsb because I remembered that being the recommendation
> for the most efficient memcpy, and it safes me an instruction. But that
> is probably only true on modern CPUs.
Yeah, it's only true on the very latest uarchs, and even
On Fri, Feb 09, 2018 at 05:43:55PM +, Andy Lutomirski wrote:
> The 64-bit code mostly uses a bunch of push instructions for this.
I had it implemented with tons of push instructions first, but that
doesn't work in cases where the stack switch needs to happen only after
everything is copied o
On Fri, Feb 09, 2018 at 09:05:02AM -0800, Linus Torvalds wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel wrote:
> > +
> > + /* Copy over the stack-frame */
> > + cld
> > + rep movsb
>
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the wh
On Fri, Feb 9, 2018 at 5:05 PM, Linus Torvalds
wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel wrote:
>> +
>> + /* Copy over the stack-frame */
>> + cld
>> + rep movsb
>
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the whole 32-bit cod
On 02/09/2018 06:05 PM, Linus Torvalds wrote:
On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel wrote:
+
+ /* Copy over the stack-frame */
+ cld
+ rep movsb
Ugh. This is going to be horrendous. Maybe not noticeable on modern
CPU's, but the whole 32-bit code is kind of pointless o
On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel wrote:
> +
> + /* Copy over the stack-frame */
> + cld
> + rep movsb
Ugh. This is going to be horrendous. Maybe not noticeable on modern
CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
At least use "rep movsl".
12 matches
Mail list logo