On March 6, 2017 9:12:41 AM PST, Logan Gunthorpe wrote:
>
>
>On 06/03/17 12:28 AM, H. Peter Anvin wrote:
>> On 03/05/17 23:01, Logan Gunthorpe wrote:
>>>
>>> On 05/03/17 12:54 PM, Borislav Petkov wrote:
Logan, wanna give that a try, see if it takes care of your issue?
>>>
>>> Well honestly my
On 06/03/17 12:28 AM, H. Peter Anvin wrote:
> On 03/05/17 23:01, Logan Gunthorpe wrote:
>>
>> On 05/03/17 12:54 PM, Borislav Petkov wrote:
>>> Logan, wanna give that a try, see if it takes care of your issue?
>>
>> Well honestly my issue was solved by fixing my kernel config. I have no
>> idea wh
On Mon, Mar 06, 2017 at 05:41:22AM -0800, h...@zytor.com wrote:
> It isn't really that straightforward IMO.
>
> For UC memory transaction size really needs to be specified explicitly
> at all times and should be part of the API, rather than implicit.
>
> For WC/WT/WB device memory, the ordinary mem
On March 6, 2017 5:33:28 AM PST, Borislav Petkov wrote:
>On Mon, Mar 06, 2017 at 12:01:10AM -0700, Logan Gunthorpe wrote:
>> Well honestly my issue was solved by fixing my kernel config. I have
>no
>> idea why I had optimize for size in there in the first place.
>
>I still think that we should add
On Mon, Mar 06, 2017 at 12:01:10AM -0700, Logan Gunthorpe wrote:
> Well honestly my issue was solved by fixing my kernel config. I have no
> idea why I had optimize for size in there in the first place.
I still think that we should address the iomem memcpy Linus mentioned.
So how about this partia
On 03/05/17 23:01, Logan Gunthorpe wrote:
>
> On 05/03/17 12:54 PM, Borislav Petkov wrote:
>> Logan, wanna give that a try, see if it takes care of your issue?
>
> Well honestly my issue was solved by fixing my kernel config. I have no
> idea why I had optimize for size in there in the first plac
On Sun, Mar 05, 2017 at 11:19:42AM -0800, Linus Torvalds wrote:
>> But it is *not* the right thing to use on IO memory, because the CPU
>> only does the magic cacheline access optimizations on cacheable
>> memory!
Yes, and actually this is where I started. I thought my memcpy was using
byte acces
On Sun, Mar 5, 2017 at 11:54 AM, Borislav Petkov wrote:
>>
>> We seem to have broken this *really* long ago, though.
>
> I wonder why nothing blew up or failed strangely by now...
The hardware that cared was pretty broken to begin with, and I think
it was mainly some really odd graphics cards.
A
On Sun, Mar 05, 2017 at 11:19:42AM -0800, Linus Torvalds wrote:
> Actually, the "fromio/toio" code should never use regular memcpy().
> There used to be devices that literally broke on 64-bit accesses due
> to broken PCI crud.
>
> We seem to have broken this *really* long ago, though.
I wonder wh
On Sun, Mar 5, 2017 at 1:50 AM, Borislav Petkov wrote:
>
> gcc can't possibly know on what targets is that kernel going to be
> booted on. So it probably does some universally optimal things, like in
> the dmi_scan_machine() case:
>
> memcpy_fromio(buf, p, 32);
>
> turns into:
>
>
On Sun, Mar 05, 2017 at 12:18:23PM +0100, Borislav Petkov wrote:
> Also, I need to check what vmlinuz size bloat we're talking: with the
> diff below, we do add padding which looks like this:
Yeah, even a tailored config adds ~67K:
textdata bss dec hex filename
7567290 4040894
On Sun, Mar 05, 2017 at 10:50:59AM +0100, Borislav Petkov wrote:
> On Sat, Mar 04, 2017 at 04:56:38PM -0800, h...@zytor.com wrote:
> > That's what the -march= and -mtune= option do!
>
> How does that even help with a distro kernel built with -mtune=generic ?
>
> gcc can't possibly know on what ta
On Sat, Mar 04, 2017 at 09:58:14PM -0700, Logan Gunthorpe wrote:
> So, I've found that my kernel config had the OPTIMIZE_FOR_SIZE selected
> instead of OPTIMIZE_FOR_PERFORMANCE. I'm not sure why that is but
> switching to the latter option fixes my problem. A memcpy call is used
> instead of the po
On Sat, Mar 04, 2017 at 04:56:38PM -0800, h...@zytor.com wrote:
> That's what the -march= and -mtune= option do!
How does that even help with a distro kernel built with -mtune=generic ?
gcc can't possibly know on what targets is that kernel going to be
booted on. So it probably does some universa
Hey,
On 04/03/17 05:33 PM, Borislav Petkov wrote:
> On Sat, Mar 04, 2017 at 04:23:17PM -0800, h...@zytor.com wrote:
>> What are the compilation flags? It may be that gcc still does TRT
>> depending on this call site. I'd check what gcc6 or 7 generates,
>> though.
> Hmm, I wish we were able to say,
On March 4, 2017 4:33:49 PM PST, Borislav Petkov wrote:
>On Sat, Mar 04, 2017 at 04:23:17PM -0800, h...@zytor.com wrote:
>> What are the compilation flags? It may be that gcc still does TRT
>> depending on this call site. I'd check what gcc6 or 7 generates,
>> though.
>
>Well, I don't think that m
On Sat, Mar 04, 2017 at 04:23:17PM -0800, h...@zytor.com wrote:
> What are the compilation flags? It may be that gcc still does TRT
> depending on this call site. I'd check what gcc6 or 7 generates,
> though.
Well, I don't think that matters: if you're building a kernel on one
machine to boot on a
On March 4, 2017 4:14:47 PM PST, Borislav Petkov wrote:
>On Sat, Mar 04, 2017 at 03:55:27PM -0800, h...@zytor.com wrote:
>> For newer processors, as determined by -mtune=, it is actually the
>> best option for an arbitrary copy.
>
>So his doesn't have ERMS - it is a SNB - so if for SNB REP_GOOD is
On Sat, Mar 04, 2017 at 03:55:27PM -0800, h...@zytor.com wrote:
> For newer processors, as determined by -mtune=, it is actually the
> best option for an arbitrary copy.
So his doesn't have ERMS - it is a SNB - so if for SNB REP_GOOD is
the best option for memcpy, then we should probably build wit
On March 4, 2017 3:46:44 PM PST, Logan Gunthorpe wrote:
>Hi Borislav,
>
>Thanks for the help.
>
>On 04/03/17 03:43 PM, Borislav Petkov wrote:
>> You can boot with "debug-alternative" and look for those strings
>where
>
>Here's the symbols for memcpy and the corresponding apply_alternatives
>lines:
On Sat, Mar 04, 2017 at 01:08:15PM -0700, Logan Gunthorpe wrote:
> So my question is: how do I find out what version of memcpy my actual
> machine is using and fix it if it is wrong?
You can boot with "debug-alternative" and look for those strings where
it says "feat:"
[0.261386] apply_altern
Hi,
I'm trying to chase down a performance issue with a driver I'm working
on that does a repeated memcpy_fromio of about 1KB from a PCI device. I
made a small change from a fixed size copy to a variable size only to be
surprised with a performance decrease of about 1/3.
I've looked through the c
22 matches
Mail list logo