Re: does call site polymorphism factor in method overrides?

2019-12-30 Thread Vitaly Davidovich
rrides? > > Good to know Vitaly! > So a poor example then. Better example is an abstract class with a method > implementation that no subtypes override, yet multiple subtypes are found > to be the receiver of a particular call site. Should we expect a > monomorphic call site in tha

Re: does call site polymorphism factor in method overrides?

2019-12-29 Thread Vitaly Davidovich
On Sun, Dec 29, 2019 at 10:22 AM Brian Harris wrote: > Hello! > > I was hoping to get one point of clarification about avoiding megamorphic > call sites, after reading these excellent articles: > > > http://www.insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/ >

Re: MESI and 'atomicity'

2019-11-25 Thread Vitaly Davidovich
On Mon, Nov 25, 2019 at 11:50 AM Peter Veentjer wrote: > I have a question about MESI. > > My question isn't about atomic operations; but about an ordinary write to > the same cacheline done by 2 cores. > > If a CPU does a write, the write is placed on the store buffer. > > Then the CPU will send

Rust

2019-10-08 Thread Vitaly Davidovich
I posed this question to this list a few years ago, but don’t recall much participation - let’s try again :). Has anyone moved their C, C++, Java, whatever low latency/high perf systems (or components thereof) to Rust? If so, what type of system/component? What has been your experience? Bonus poin

Re: purpose of an LFENCE

2019-10-08 Thread Vitaly Davidovich
FWIW, I’ve only seen lfence used precisely in the 2 cases mentioned in this thread: 1) use of non-temporal loads (ie weak ordering, normal x86 guarantees go out the window) 2) controlling execution of non-serializing instructions like rdtsc I’d be curious myself to hear of other cases. On Fri, Oc

Re: Volatile semantic for failed/noop atomic operations

2019-10-08 Thread Vitaly Davidovich
. I didn’t actually pick up on how often the termination protocol triggers - I assumed it’s an uncommon/slow path. > > > On Saturday, September 14, 2019 at 11:29:00 AM UTC-7, Vitaly Davidovich > wrote: >> >> Unlike C++, where you can specify mem ordering for failure and su

Re: Volatile semantic for failed/noop atomic operations

2019-09-14 Thread Vitaly Davidovich
On Sat, Sep 14, 2019 at 6:01 PM Simone Bordet wrote: > Hi, > > On Sat, Sep 14, 2019 at 8:28 PM Vitaly Davidovich > wrote: > > > > Unlike C++, where you can specify mem ordering for failure and success > separately, Java doesn’t allow that. But, the mem orderin

Re: Volatile semantic for failed/noop atomic operations

2019-09-14 Thread Vitaly Davidovich
On x86, I’ve never heard of failed CAS being cheaper. In theory, cache snooping can inform the core whether it’s xchg would succeed without going through the RFO dance. But, to perform the actual xchg it would need ownership regardless (if not already owned/exclusive). Sharing ordinary mutable m

Re: Volatile semantic for failed/noop atomic operations

2019-09-14 Thread Vitaly Davidovich
Unlike C++, where you can specify mem ordering for failure and success separately, Java doesn’t allow that. But, the mem ordering is the same for failure/success there. Unfortunately it doesn’t look like the javadocs mention that, but I recall Doug Lea saying that’s the case on the concurrency-in

Re: how to replace Unsafe.objectFieldOffset in jdk 11

2019-07-17 Thread Vitaly Davidovich
VarHandle does provide suitable (and better) replacements for the various Unsafe.get/putXXX methods. But, U.objectFieldOffset() is the only (easy?) Java way to inspect the layout of a class; e.g. say you apply a hacky class-hierarchy based cacheline padding to a field, and then want to assert that

Re: Varargs vs. explicit param method call

2018-05-06 Thread Vitaly Davidovich
Your understanding of how varargs calls are made is correct - it's nothing more than sugar for an allocated array to store the args. Your bench, however, explicitly disables inlining of the varargs method, and thus prevents escape analysis from potentially eliminating the array allocation. Try th

Re: Nanotrusting the Nanotime and amortization.

2018-04-25 Thread Vitaly Davidovich
On Wed, Apr 25, 2018 at 4:52 AM Aleksey Shipilev wrote: > On 04/24/2018 10:44 PM, John Hening wrote: > > I'm reading the great article from > https://shipilev.net/blog/2014/nanotrusting-nanotime/ (thanks > > Aleksey! :)) and I am not sure whether I understand correctly that. > > > > Firstly, it i

Re: Disk-based logger - write pretouch

2017-07-10 Thread Vitaly Davidovich
A few suggestions: 1) have you tried just reading the data in the prefaulting code, instead of dirtying it with a dummy write? Since this is a disk backed mapping, it should page fault and map the underlying file data (rather than mapping to a zero page, e.g.). At a high rate of dirtying, this wi

Re: Prefetching and false sharing

2017-01-29 Thread Vitaly Davidovich
bsolute numbers would be different. On Sun, Jan 29, 2017 at 2:37 PM Duarte Nunes wrote: > > > On Sunday, January 29, 2017 at 8:16:53 PM UTC+1, Vitaly Davidovich wrote: > > This. > > Also, I think the (Intel) adjacent sector prefetch is a feature enabled > through BIOS. I think

Re: Prefetching and false sharing

2017-01-29 Thread Vitaly Davidovich
This. Also, I think the (Intel) adjacent sector prefetch is a feature enabled through BIOS. I think that will pull the adjacent line to L1, whereas the spatial prefetcher is probably for streaming accesses that are loading L2. Also, I'd run the bench without atomic ops - just relaxed (atomic) or

Re: SMP vs AMP: watch your cacheline sizes

2017-01-25 Thread Vitaly Davidovich
encina wrote: >> >>> On 25/01/2017 9:31 AM, Vitaly Davidovich wrote: >>> >>>> Interesting (not just) Mono bug: >>>> http://www.mono-project.com/news/2016/09/12/arm64-icache/ >>>> >>> >>> Scary. From the article's Summary s

SMP vs AMP: watch your cacheline sizes

2017-01-24 Thread Vitaly Davidovich
Interesting (not just) Mono bug: http://www.mono-project.com/news/2016/09/12/arm64-icache/ -- Sent from my phone -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an em

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
And should also mention that doing very early load scheduling will increase register pressure as that value will need to be kept live across more instructions. Stack spills and reloads suck in a hot/tight code sequence. On Tue, Jan 17, 2017 at 7:08 PM Vitaly Davidovich wrote: > The cache m

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
tees of an atomic write don't apply, cache or no > > cache. > > > > On Wed, Jan 18, 2017 at 8:02 AM, Vitaly Davidovich > wrote: > > > > > > > > > On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev > > > wrote: > > >>

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
The cache miss latency can be hidden either by this load being done ahead of time or if there're other instructions that can execute while this load is outstanding. So breaking dependency chains is good, but extending the distance like this seems weird and may hurt common cases. If ICC does this

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
x27;s difficult to execute these instructions in OOO-manner. > But if you schedule them this way > > mov (%rax), %rbx > ... few instructions > cmp %rbx, %rdx > ... few instructions > jxx Lxxx > > It would be possible to execute them out-of-order and calculate something >

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
On Tue, Jan 17, 2017 at 3:55 PM, Sergey Melnikov < melnikov.serge...@gmail.com> wrote: > ​Hi Gil, > > ​Your ​slides are really inspiring, especially for JIT code. Now, it's > comparable with code produced by static C/C++ compilers. Have you compared > a performance of this code with a code produce

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
On Tue, Jan 17, 2017 at 3:39 PM, Aleksey Shipilev < aleksey.shipi...@gmail.com> wrote: > On 01/17/2017 12:55 PM, Vitaly Davidovich wrote: > > Atomicity of values isn't something I'd assume happens automatically. > Word > > tearing isn't observable from sin

Re: Operation Reordering

2017-01-17 Thread Vitaly Davidovich
Atomicity of values isn't something I'd assume happens automatically. Word tearing isn't observable from single threaded code. I think the only thing you can safely and portably assume is the high level "single threaded observable behavior will occur" statement. It's also interesting to note tha

Re: Operation Reordering

2017-01-16 Thread Vitaly Davidovich
Depends on which hardware. For instance, x86/64 is very specific about what memory operations can be reordered (for cacheable operations), and two stores aren't reordered. The only reordering is stores followed by loads, where the load can appear to reorder with the preceding store. On Mon, Jan 1

Re: How hardware implements CAS

2017-01-04 Thread Vitaly Davidovich
taken away (even if the value at the address is still the expected one). On Wed, Jan 4, 2017 at 2:59 PM, Vitaly Davidovich wrote: > Probably worth a mention that "CAS" is a bit too generic. For instance, > you can have weak and strong CAS, with some architectures only providing >

Re: How hardware implements CAS

2017-01-04 Thread Vitaly Davidovich
Probably worth a mention that "CAS" is a bit too generic. For instance, you can have weak and strong CAS, with some architectures only providing strong (e.g. intel) and some providing/allowing both. Depending on whether a weak or strong CAS is used, the memory ordering/pipeline implications will

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
7;t impact i-cache or pollute the BTB history. On Thu, Dec 22, 2016 at 3:58 PM, Gil Tene wrote: > > > On Thursday, December 22, 2016 at 10:46:47 AM UTC-8, Vitaly Davidovich > wrote: >> >> >> >> On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene wrote: >> >>

Re: Any serious Rust users here?

2016-12-22 Thread Vitaly Davidovich
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html is the llvm thread I was referring to. On Thu, Dec 22, 2016 at 2:03 PM, Vitaly Davidovich wrote: > Rajiv/Marshall, > > Thanks for your comments. I guess I should rephrase my initial post - I'm > *particular

Re: Any serious Rust users here?

2016-12-22 Thread Vitaly Davidovich
Rajiv/Marshall, Thanks for your comments. I guess I should rephrase my initial post - I'm *particularly* interested in production and migration scenarios/stories, but happy to hear others' casual dabbling experience as well. I agree on the compile time, but there's good news and bad news. The g

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
On Thu, Dec 22, 2016 at 12:59 PM, Gil Tene wrote: > > > On Thursday, December 22, 2016 at 9:33:09 AM UTC-8, Vitaly Davidovich > wrote: >> >> >> >> On Thu, Dec 22, 2016 at 12:14 PM, Gil Tene wrote: >> >>> Go's GC story is evolving. And it

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
On Thu, Dec 22, 2016 at 12:14 PM, Gil Tene wrote: > 1. The ability to defragment the heap in order to support indefinite > execution lengths that do not depend on friendly behavior patterns for > object sizes over time. And defragmentation requires moving objects around > before they die. Heaps t

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
y > concurrent, incremental-STW (with viable arbitrarily small incremental > steps), or a combination of the two. > > And yes, doing that (concurrent generational GC) while supporting great > latency, high throughput, and high efficiency all at the same time, is very > possible. It

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
On Thu, Dec 22, 2016 at 8:05 AM, Remi Forax wrote: > > > -- > > *De: *"Vitaly Davidovich" > *À: *mechanical-sympathy@googlegroups.com > *Envoyé: *Jeudi 22 Décembre 2016 13:46:50 > *Objet: *Re: Modern Garbage Collection (good a

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
right choice for them and how they see Go being used. Mind you, I'm not a fan nor a user of Go so I'm referring purely to their stipulated strategy on how to evolve their GC. On Thu, Dec 22, 2016 at 7:37 AM Remi Forax wrote: > > > ------ > > *

Re: Modern Garbage Collection (good article)

2016-12-22 Thread Vitaly Davidovich
FWIW, I think the Go team is right in favoring lower latency over throughput of their GC given the expected usage scenarios for Go. In fact, most of the (Hotspot based) Java GC horror stories involve very long pauses (G1 and CMS not excluded) - I've yet to hear anyone complain that their "Big Data

Any serious Rust users here?

2016-12-22 Thread Vitaly Davidovich
Curious if anyone on this list is running any non-trivial Rust code in production? And if so, would love to hear some thoughts on how that's going. Also, if the code either interops with existing c/c++/java code or is a replacement/rewrite/port of code from those languages, interested to hear that

Re: private final static optimization

2016-12-19 Thread Vitaly Davidovich
This only optimizes final field reads when the enclosing object is a static final itself :). E.g. it'll help uses of Enum::ordinal() and the like, but not much beyond that. On Mon, Dec 19, 2016 at 2:45 PM, Chris Vest wrote: > On HotSpot you can also get some optimisation benefit out of final >

Re: private final static optimization

2016-12-18 Thread Vitaly Davidovich
t more palatable and doesn't require using modules. But we'll see how this pans out. On Sun, Dec 18, 2016 at 3:29 PM Remi Forax wrote: > Hi Vitaly, > > ---------- > > *De: *"Vitaly Davidovich" > *À: *mechanical-sympathy@googlegroups.c

Re: private final static optimization

2016-12-18 Thread Vitaly Davidovich
It doesn't care about reflection - modifying final fields via reflection is undefined by the JLS. Unfortunately, the same optimization isn't done for instance finals because some well known frameworks use that facility. Oracle is working on ways to mitigate that in a backcompat manner (deopt when

Re: Minimum realistic GC time for G1 collector on 10GB

2016-12-14 Thread Vitaly Davidovich
Have you tried just setting -Xmx10g and -XX:MaxGCPauseMillis=10? This is typically a good baseline to start with for G1; it'll use the pause time goal to adaptively size the young gen based on evacuation cost statistics it maintains. With a 10ms goal, it'll size it pretty conservatively and you'll

Re: No proposals to implement 64bit mmap?

2016-12-08 Thread Vitaly Davidovich
I think long indexing is punted until Arrays 2.0, although I don't know where/how that's going (if at all - initial proposal is from circa 2012). Otherwise, e.g., long indexing on a HeapByteBuffer is crufty. So here's a workaround - stop using Java for such scenarios/apps :) j/k (... sort of). On

Re: Question about SBE and DirectBuffer

2016-11-15 Thread Vitaly Davidovich
day, 15 November 2016 19:14:23 UTC, Vitaly Davidovich wrote: > > Could be this: https://bugs.openjdk.java.net/browse/JDK-8087134. > > Are the failures happening when C1 is enabled (i.e. Tiered comp is > enabled)? > > On Tue, Nov 15, 2016 at 1:44 PM Martin Thompson wrote: > &

Re: Question about SBE and DirectBuffer

2016-11-15 Thread Vitaly Davidovich
Could be this: https://bugs.openjdk.java.net/browse/JDK-8087134. Are the failures happening when C1 is enabled (i.e. Tiered comp is enabled)? On Tue, Nov 15, 2016 at 1:44 PM Martin Thompson wrote: > We have had another similar issue raised on this in a single threaded > example. It seems that w

Re: Single writer counter: how expensive is a volatile read?

2016-10-30 Thread Vitaly Davidovich
On Sunday, October 30, 2016, Aleksey Shipilev wrote: > On 10/30/2016 05:55 AM, Peter Veentjer wrote: > > Let me clarify. > > The discussion is around removing the volatile read in the inc method. > > Ah, sorry I misinterpreted the question. It usually goes the other way: > the reads vastly outnum

Re: Single writer counter: how expensive is a volatile read?

2016-10-30 Thread Vitaly Davidovich
On Sunday, October 30, 2016, Aleksey Shipilev wrote: > On 10/29/2016 10:31 PM, Vitaly Davidovich wrote: > > There's one thing I still can't get someone at Oracle to clarify, which > > is whether getOpaque ensures atomicity of the read. I believe it would, > &

Re: Single writer counter: how expensive is a volatile read?

2016-10-29 Thread Vitaly Davidovich
t method doesn't modify memory. Hence the term "opaque". However, it doesn't emit any CPU fences, and that's the difference between volatile load and getOpaque on weaker memory model archs. There's one thing I still can't get someone at Oracle to clarify

Re: Single writer counter: how expensive is a volatile read?

2016-10-29 Thread Vitaly Davidovich
On Saturday, October 29, 2016, Olivier Bourgain wrote: > I think this deserves a benchmark. > > Le samedi 29 octobre 2016 13:15:35 UTC+2, Aleksey Shipilev a écrit : >> >> On 10/29/2016 10:13 AM, Peter Veentjer wrote: >> > So you get something like this: >> > >> > public class Counter { >> > >> >

Re: Unchecked exceptions for IO considered harmful.

2016-08-15 Thread Vitaly Davidovich
Why is it egregious? It's detailed in the types of exceptions it throws, yes, but that's good assuming you want to handle some of those types (and there are cases where those exceptions can be handled properly). Even before ReflectiveOperationException, you could use multi-catch since Java 7 to ge