rrides?
>
> Good to know Vitaly!
> So a poor example then. Better example is an abstract class with a method
> implementation that no subtypes override, yet multiple subtypes are found
> to be the receiver of a particular call site. Should we expect a
> monomorphic call site in tha
On Sun, Dec 29, 2019 at 10:22 AM Brian Harris
wrote:
> Hello!
>
> I was hoping to get one point of clarification about avoiding megamorphic
> call sites, after reading these excellent articles:
>
>
> http://www.insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/
>
On Mon, Nov 25, 2019 at 11:50 AM Peter Veentjer wrote:
> I have a question about MESI.
>
> My question isn't about atomic operations; but about an ordinary write to
> the same cacheline done by 2 cores.
>
> If a CPU does a write, the write is placed on the store buffer.
>
> Then the CPU will send
I posed this question to this list a few years ago, but don’t recall much
participation - let’s try again :).
Has anyone moved their C, C++, Java, whatever low latency/high perf systems
(or components thereof) to Rust? If so, what type of system/component? What
has been your experience? Bonus poin
FWIW, I’ve only seen lfence used precisely in the 2 cases mentioned in this
thread:
1) use of non-temporal loads (ie weak ordering, normal x86 guarantees go
out the window)
2) controlling execution of non-serializing instructions like rdtsc
I’d be curious myself to hear of other cases.
On Fri, Oc
. I didn’t actually pick up on how often the termination protocol
triggers - I assumed it’s an uncommon/slow path.
>
>
> On Saturday, September 14, 2019 at 11:29:00 AM UTC-7, Vitaly Davidovich
> wrote:
>>
>> Unlike C++, where you can specify mem ordering for failure and su
On Sat, Sep 14, 2019 at 6:01 PM Simone Bordet
wrote:
> Hi,
>
> On Sat, Sep 14, 2019 at 8:28 PM Vitaly Davidovich
> wrote:
> >
> > Unlike C++, where you can specify mem ordering for failure and success
> separately, Java doesn’t allow that. But, the mem orderin
On x86, I’ve never heard of failed CAS being cheaper. In theory, cache
snooping can inform the core whether it’s xchg would succeed without going
through the RFO dance. But, to perform the actual xchg it would need
ownership regardless (if not already owned/exclusive).
Sharing ordinary mutable m
Unlike C++, where you can specify mem ordering for failure and success
separately, Java doesn’t allow that. But, the mem ordering is the same for
failure/success there. Unfortunately it doesn’t look like the javadocs
mention that, but I recall Doug Lea saying that’s the case on the
concurrency-in
VarHandle does provide suitable (and better) replacements for the various
Unsafe.get/putXXX methods. But, U.objectFieldOffset() is the only (easy?)
Java way to inspect the layout of a class; e.g. say you apply a hacky
class-hierarchy based cacheline padding to a field, and then want to assert
that
Your understanding of how varargs calls are made is correct - it's nothing
more than sugar for an allocated array to store the args. Your bench,
however, explicitly disables inlining of the varargs method, and thus
prevents escape analysis from potentially eliminating the array
allocation. Try th
On Wed, Apr 25, 2018 at 4:52 AM Aleksey Shipilev
wrote:
> On 04/24/2018 10:44 PM, John Hening wrote:
> > I'm reading the great article from
> https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks
> > Aleksey! :)) and I am not sure whether I understand correctly that.
> >
> > Firstly, it i
A few suggestions:
1) have you tried just reading the data in the prefaulting code, instead of
dirtying it with a dummy write? Since this is a disk backed mapping, it
should page fault and map the underlying file data (rather than mapping to
a zero page, e.g.). At a high rate of dirtying, this wi
bsolute numbers would be different.
On Sun, Jan 29, 2017 at 2:37 PM Duarte Nunes
wrote:
>
>
> On Sunday, January 29, 2017 at 8:16:53 PM UTC+1, Vitaly Davidovich wrote:
>
> This.
>
> Also, I think the (Intel) adjacent sector prefetch is a feature enabled
> through BIOS. I think
This.
Also, I think the (Intel) adjacent sector prefetch is a feature enabled
through BIOS. I think that will pull the adjacent line to L1, whereas the
spatial prefetcher is probably for streaming accesses that are loading L2.
Also, I'd run the bench without atomic ops - just relaxed (atomic) or
encina wrote:
>>
>>> On 25/01/2017 9:31 AM, Vitaly Davidovich wrote:
>>>
>>>> Interesting (not just) Mono bug:
>>>> http://www.mono-project.com/news/2016/09/12/arm64-icache/
>>>>
>>>
>>> Scary. From the article's Summary s
Interesting (not just) Mono bug:
http://www.mono-project.com/news/2016/09/12/arm64-icache/
--
Sent from my phone
--
You received this message because you are subscribed to the Google Groups
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an em
And should also mention that doing very early load scheduling will increase
register pressure as that value will need to be kept live across more
instructions. Stack spills and reloads suck in a hot/tight code sequence.
On Tue, Jan 17, 2017 at 7:08 PM Vitaly Davidovich wrote:
> The cache m
tees of an atomic write don't apply, cache or no
>
> cache.
>
>
>
> On Wed, Jan 18, 2017 at 8:02 AM, Vitaly Davidovich
> wrote:
>
> >
>
> >
>
> > On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev
>
> > wrote:
>
> >>
The cache miss latency can be hidden either by this load being done ahead
of time or if there're other instructions that can execute while this load
is outstanding. So breaking dependency chains is good, but extending the
distance like this seems weird and may hurt common cases. If ICC does this
x27;s difficult to execute these instructions in OOO-manner.
> But if you schedule them this way
>
> mov (%rax), %rbx
> ... few instructions
> cmp %rbx, %rdx
> ... few instructions
> jxx Lxxx
>
> It would be possible to execute them out-of-order and calculate something
>
On Tue, Jan 17, 2017 at 3:55 PM, Sergey Melnikov <
melnikov.serge...@gmail.com> wrote:
> Hi Gil,
>
> Your slides are really inspiring, especially for JIT code. Now, it's
> comparable with code produced by static C/C++ compilers. Have you compared
> a performance of this code with a code produce
On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev <
aleksey.shipi...@gmail.com> wrote:
> On 01/17/2017 12:55 PM, Vitaly Davidovich wrote:
> > Atomicity of values isn't something I'd assume happens automatically.
> Word
> > tearing isn't observable from sin
Atomicity of values isn't something I'd assume happens automatically. Word
tearing isn't observable from single threaded code.
I think the only thing you can safely and portably assume is the high level
"single threaded observable behavior will occur" statement. It's also
interesting to note tha
Depends on which hardware. For instance, x86/64 is very specific about
what memory operations can be reordered (for cacheable operations), and two
stores aren't reordered. The only reordering is stores followed by loads,
where the load can appear to reorder with the preceding store.
On Mon, Jan 1
taken away (even if the value at the address is still the expected
one).
On Wed, Jan 4, 2017 at 2:59 PM, Vitaly Davidovich wrote:
> Probably worth a mention that "CAS" is a bit too generic. For instance,
> you can have weak and strong CAS, with some architectures only providing
>
Probably worth a mention that "CAS" is a bit too generic. For instance,
you can have weak and strong CAS, with some architectures only providing
strong (e.g. intel) and some providing/allowing both. Depending on whether
a weak or strong CAS is used, the memory ordering/pipeline implications
will
7;t impact i-cache or pollute the BTB history.
On Thu, Dec 22, 2016 at 3:58 PM, Gil Tene wrote:
>
>
> On Thursday, December 22, 2016 at 10:46:47 AM UTC-8, Vitaly Davidovich
> wrote:
>>
>>
>>
>> On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene wrote:
>>
>>
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html is the
llvm thread I was referring to.
On Thu, Dec 22, 2016 at 2:03 PM, Vitaly Davidovich
wrote:
> Rajiv/Marshall,
>
> Thanks for your comments. I guess I should rephrase my initial post - I'm
> *particular
Rajiv/Marshall,
Thanks for your comments. I guess I should rephrase my initial post - I'm
*particularly* interested in production and migration scenarios/stories,
but happy to hear others' casual dabbling experience as well.
I agree on the compile time, but there's good news and bad news. The g
On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene wrote:
>
>
> On Thursday, December 22, 2016 at 9:33:09 AM UTC-8, Vitaly Davidovich
> wrote:
>>
>>
>>
>> On Thu, Dec 22, 2016 at 12:14 PM, Gil Tene wrote:
>>
>>> Go's GC story is evolving. And it
On Thu, Dec 22, 2016 at 12:14 PM, Gil Tene wrote:
> 1. The ability to defragment the heap in order to support indefinite
> execution lengths that do not depend on friendly behavior patterns for
> object sizes over time. And defragmentation requires moving objects around
> before they die. Heaps t
y
> concurrent, incremental-STW (with viable arbitrarily small incremental
> steps), or a combination of the two.
>
> And yes, doing that (concurrent generational GC) while supporting great
> latency, high throughput, and high efficiency all at the same time, is very
> possible. It
On Thu, Dec 22, 2016 at 8:05 AM, Remi Forax wrote:
>
>
> --
>
> *De: *"Vitaly Davidovich"
> *À: *mechanical-sympathy@googlegroups.com
> *Envoyé: *Jeudi 22 Décembre 2016 13:46:50
> *Objet: *Re: Modern Garbage Collection (good a
right choice for them and how they see Go being used.
Mind you, I'm not a fan nor a user of Go so I'm referring purely to their
stipulated strategy on how to evolve their GC.
On Thu, Dec 22, 2016 at 7:37 AM Remi Forax wrote:
>
>
> ------
>
> *
FWIW, I think the Go team is right in favoring lower latency over
throughput of their GC given the expected usage scenarios for Go.
In fact, most of the (Hotspot based) Java GC horror stories involve very
long pauses (G1 and CMS not excluded) - I've yet to hear anyone complain
that their "Big Data
Curious if anyone on this list is running any non-trivial Rust code in
production? And if so, would love to hear some thoughts on how that's going.
Also, if the code either interops with existing c/c++/java code or is a
replacement/rewrite/port of code from those languages, interested to hear
that
This only optimizes final field reads when the enclosing object is a static
final itself :). E.g. it'll help uses of Enum::ordinal() and the like, but
not much beyond that.
On Mon, Dec 19, 2016 at 2:45 PM, Chris Vest wrote:
> On HotSpot you can also get some optimisation benefit out of final
>
t more palatable and doesn't
require using modules. But we'll see how this pans out.
On Sun, Dec 18, 2016 at 3:29 PM Remi Forax wrote:
> Hi Vitaly,
>
> ----------
>
> *De: *"Vitaly Davidovich"
> *À: *mechanical-sympathy@googlegroups.c
It doesn't care about reflection - modifying final fields via reflection is
undefined by the JLS. Unfortunately, the same optimization isn't done for
instance finals because some well known frameworks use that facility.
Oracle is working on ways to mitigate that in a backcompat manner (deopt
when
Have you tried just setting -Xmx10g and -XX:MaxGCPauseMillis=10? This is
typically a good baseline to start with for G1; it'll use the pause time
goal to adaptively size the young gen based on evacuation cost statistics
it maintains. With a 10ms goal, it'll size it pretty conservatively and
you'll
I think long indexing is punted until Arrays 2.0, although I don't know
where/how that's going (if at all - initial proposal is from circa 2012).
Otherwise, e.g., long indexing on a HeapByteBuffer is crufty.
So here's a workaround - stop using Java for such scenarios/apps :) j/k
(... sort of).
On
day, 15 November 2016 19:14:23 UTC, Vitaly Davidovich wrote:
>
> Could be this: https://bugs.openjdk.java.net/browse/JDK-8087134.
>
> Are the failures happening when C1 is enabled (i.e. Tiered comp is
> enabled)?
>
> On Tue, Nov 15, 2016 at 1:44 PM Martin Thompson wrote:
>
&
Could be this: https://bugs.openjdk.java.net/browse/JDK-8087134.
Are the failures happening when C1 is enabled (i.e. Tiered comp is enabled)?
On Tue, Nov 15, 2016 at 1:44 PM Martin Thompson wrote:
> We have had another similar issue raised on this in a single threaded
> example. It seems that w
On Sunday, October 30, 2016, Aleksey Shipilev
wrote:
> On 10/30/2016 05:55 AM, Peter Veentjer wrote:
> > Let me clarify.
> > The discussion is around removing the volatile read in the inc method.
>
> Ah, sorry I misinterpreted the question. It usually goes the other way:
> the reads vastly outnum
On Sunday, October 30, 2016, Aleksey Shipilev
wrote:
> On 10/29/2016 10:31 PM, Vitaly Davidovich wrote:
> > There's one thing I still can't get someone at Oracle to clarify, which
> > is whether getOpaque ensures atomicity of the read. I believe it would,
> &
t
method doesn't modify memory. Hence the term "opaque". However, it
doesn't emit any CPU fences, and that's the difference between volatile
load and getOpaque on weaker memory model archs.
There's one thing I still can't get someone at Oracle to clarify
On Saturday, October 29, 2016, Olivier Bourgain
wrote:
> I think this deserves a benchmark.
>
> Le samedi 29 octobre 2016 13:15:35 UTC+2, Aleksey Shipilev a écrit :
>>
>> On 10/29/2016 10:13 AM, Peter Veentjer wrote:
>> > So you get something like this:
>> >
>> > public class Counter {
>> >
>> >
Why is it egregious? It's detailed in the types of exceptions it throws,
yes, but that's good assuming you want to handle some of those types (and
there are cases where those exceptions can be handled properly). Even
before ReflectiveOperationException, you could use multi-catch since Java 7
to ge
49 matches
Mail list logo