On 01/13/2016 02:45 AM, Will Deacon wrote:
I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:
P0:
Wx = 1
P1:
Rx == 1
Wy = 1
P2:
Ry == 1
Rx = 0
So are you saying that this is also forbidden?
Imagine that P0
On 01/13/2016 12:48 PM, Peter Zijlstra wrote:
On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
I ask HW team about it but I have a question - has it any relationship with
replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
Of course. If you cannot explain the semantics o
On Wed, Jan 13, 2016 at 11:02:35AM -0800, Leonid Yegoshin wrote:
> I ask HW team about it but I have a question - has it any relationship with
> replacing MIPS SYNC with lightweight SYNCs (SYNC_WMB etc)?
Of course. If you cannot explain the semantics of the primitives you
introduce, how can we ju
mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.
Just poking at SP would be the most natural, but if we
then read the value from SP, we get a false dependency
which will slow us down.
This was noted in this article:
ht
The comment about wmb being non-nop to deal with non-intel CPUs is a
left over from before commit 09df7c4c8097 ("x86: Remove
CONFIG_X86_OOSTORE").
It makes no sense now: in particular, wmb is not a nop even for regular
intel CPUs because of weird use-cases e.g. dealing with WC memory.
Drop this c
addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc.
Historically, gcc doesn't need one on x86, and always considers flags
clobbered. We are probably missing the cc clobber in a *lot* of places
for this reason.
But even if not necessary, it's probably a good thing to add for
doc
On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even
that is generally questionable.
Leave them around as historial unless somebody can point to a case where
they care about the performance, but tweak the comment so people
don't think they are strictly required in all cases.
Si
mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.
So let's use the locked variant everywhere.
While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h
The documentati
(I try to answer on multiple mails in one)
First of all, it seems like some generic notes should be given here:
1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in
some CPUs. On that CPUs it basically kills pipelines in each CPU, can do
a special memory/IO bus transaction (sim
Kernel headers should use linux/types.h based definitions.
Signed-off-by: Gleb Fotengauer-Malinovskiy
---
include/uapi/linux/virtio_gpu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/virtio_gpu.h b/include/uapi/linux/virtio_gpu.h
index 7a63faa..4b04ead 1
On 01/13/2016 02:45 AM, Will Deacon wrote:
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:
P0:
Wx = 1
P1:
Rx == 1
Wy = 1
P2:
Ry == 1
Rx = 0
On 01/12/2016 01:40 PM, Peter Zijlstra wrote:
It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
resource, especially taking into account that "lightweight syncs" are
converted to a heavy "SYNC 0" in man
On 01/10/2016 06:18 AM, Michael S. Tsirkin wrote:
On mips dma_rmb, dma_wmb, smp_store_mb, read_barrier_depends,
smp_read_barrier_depends, smp_store_release and smp_load_acquire match
the asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.
This sta
OASIS members and other interested parties,
The OASIS Virtual I/O Device (VIRTIO) TC members [1] have produced an
updated Committee Specification Draft (CSD) and submitted this
specification for 15-day public review:
Virtual I/O Device (VIRTIO) Version 1.0
Committee Specification Draft 05 / Publi
On Wed, Jan 13, 2016 at 8:42 AM, Michael S. Tsirkin wrote:
>
> Oh, I think this means we need a "cc" clobber.
It's probably a good idea to add one.
Historically, gcc doesn't need one on x86, and always considers flags
clobbered. We are probably missing the cc clobber in a *lot* of places
for thi
This series is a respin of the following patch:
http://patchwork.ozlabs.org/patch/565921/
Patch 1 is preliminary work: it gives better names to the helpers that are
involved in cross-endian support.
Patch 2 is actually a v2 of the original patch. All devices now call a
helper in the generic code
The way vring endianness is being handled currently obfuscates
the code in vhost_init_used().
This patch tries to fix that by doing the following:
- move the the code that adjusts endianness to a dedicated helper
- export this helper so that backends explicitely call it
No behaviour change.
Sign
The default use case for vhost is when the host and the vring have the
same endianness (default native endianness). But there are cases where
they differ and vhost should byteswap when accessing the vring:
- the host is big endian and the vring comes from a virtio 1.0 device
which is always littl
On Wed, Jan 13, 2016 at 05:53:20PM +0100, Borislav Petkov wrote:
> On Wed, Jan 13, 2016 at 06:42:48PM +0200, Michael S. Tsirkin wrote:
> > Oh, I think this means we need a "cc" clobber.
>
> Btw, does your microbenchmark do it too?
Yes - I fixed it now, but it did not affect the result.
We'd need
On Wed, Jan 13, 2016 at 06:42:48PM +0200, Michael S. Tsirkin wrote:
> Oh, I think this means we need a "cc" clobber.
Btw, does your microbenchmark do it too?
Because, the "cc" clobber should cause additional handling of flags,
depending on the context. It won't matter if the context doesn't need
On Wed, Jan 13, 2016 at 05:33:31PM +0100, Borislav Petkov wrote:
> On Wed, Jan 13, 2016 at 06:25:21PM +0200, Michael S. Tsirkin wrote:
> > Which flag do you refer to, exactly?
>
> All the flags in rFLAGS which ADD modifies: OF,SF,ZF,AF,PF,CF
Oh, I think this means we need a "cc" clobber.
This al
On Wed, Jan 13, 2016 at 06:25:21PM +0200, Michael S. Tsirkin wrote:
> Which flag do you refer to, exactly?
All the flags in rFLAGS which ADD modifies: OF,SF,ZF,AF,PF,CF
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
___
Vir
The following changes since commit afd2ff9b7e1b367172f18ba7f693dfb62bdcb2dc:
Linux 4.4 (2016-01-10 15:01:32 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 43e361f23c49dbddf74f56ddf6cdd8
On Wed, Jan 13, 2016 at 07:10:15PM +0300, Gleb Fotengauer-Malinovskiy wrote:
> Kernel headers should use linux/types.h based definitions.
>
> Signed-off-by: Gleb Fotengauer-Malinovskiy
Acked-by: Michael S. Tsirkin
> ---
> include/uapi/linux/virtio_gpu.h | 2 +-
> 1 file changed, 1 insertion(+
On Wed, Jan 13, 2016 at 05:17:04PM +0100, Borislav Petkov wrote:
> On Tue, Jan 12, 2016 at 03:24:05PM -0800, Linus Torvalds wrote:
> > But talking to the hw people about this is certainly a good idea regardless.
>
> I'm not seeing it in this thread but I might've missed it too. Anyway,
> I'm being
On Tue, Jan 12, 2016 at 01:37:38PM -0800, Linus Torvalds wrote:
> On Tue, Jan 12, 2016 at 12:59 PM, Andy Lutomirski wrote:
> >
> > Here's an article with numbers:
> >
> > http://shipilev.net/blog/2014/on-the-fence-with-dependencies/
>
> Well, that's with the busy loop and one set of code generati
On Tue, Jan 12, 2016 at 03:24:05PM -0800, Linus Torvalds wrote:
> But talking to the hw people about this is certainly a good idea regardless.
I'm not seeing it in this thread but I might've missed it too. Anyway,
I'm being reminded that the ADD will change rFLAGS while MFENCE doesn't
touch them.
On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
> >The issue I have with the SYNC description in the text above is that it
> >describes the single CPU (program order) and the dual-CPU (confusingly
> >named global order) cases, but then doesn't generalise any further. That
> >means
28 matches
Mail list logo