I think maybe we also want to measure how a cmpxchg vs lck;add solution 
performs under contention.

Objects in read only state (be it cow’ed value types or not) might be shared 
between threads with the expectation that they are fast, i.e. the argument if 
retain/release are contented something else is wrong does not apply.

A load/cmpxchg might sent two memory coherence messages (one for 
shared/exclusive for the load/ one for modified for the cmpxchg) and 
mispredicted branches (pipeline flush) on state change under contention might 
exhibit worse performance than a lck;add (one coherence message M, no 
misprediced branch) depending on how things are implemented under the hood. 
There is always an opportunity for another level of coherence speculation ….

Other benefits we get might outweigh any such cost though.


> On Mar 16, 2016, at 11:29 PM, Greg Parker via swift-dev <swift-dev@swift.org> 
> wrote:
> 
>> 
>> On Mar 15, 2016, at 11:59 PM, Greg Parker via swift-dev 
>> <swift-dev@swift.org> wrote:
>> 
>> I am considering a new representation for Swift refcounts and other 
>> per-object data. This is an outline of the scheme. Comments and suggestions 
>> welcome.
>> 
>> Today, each object stores 64-bits of refcounts and flags after the isa field.
>> 
>> In this new system, each object would store a pointer-size field after the 
>> isa field. This field would have two cases: it could store refcounts and 
>> flags, or it could store a pointer to a side allocation that would store 
>> refcounts and flags and additional per-object data.
>> 
>> Advantages:
>> * Saves 4 bytes per object on 32-bit for most objects.
>> * Improves refcount overflow and underflow detection.
>> * Might allow an inlineable retain/release fast path in the future.
>> * Allows a new weak reference implementation that doesn't need to keep 
>> entire dead objects alive.
>> * Allows inexpensive per-object storage for future features like associated 
>> references or class extensions with instance variables.
>> 
>> Disadvantages:
>> * Basic RR operations might be slower on x86_64. This needs to be measured. 
>> ARM architectures are probably unchanged.
> 
> I wrote a performance mockup of the fast path. It simply checks the MSB in 
> the appropriate places in RefCount.h but does not actually implement any side 
> allocation. I ran it on some RR-heavy benchmarks (QuickSort, InsertionSort, 
> HeapSort, Array2D) on x86_64 and arm64.
> 
> arm64 is in fact approximately unchanged. Any difference either way is much 
> less than 1%.
> 
> x86_64 is measurably slower:
>   1% QuickSort
>   2% InsertionSort
>   4% Array2D
>   5% HeapSort
> 
> 
> -- 
> Greg Parker     gpar...@apple.com     Runtime Wrangler
> 
> 
> _______________________________________________
> swift-dev mailing list
> swift-dev@swift.org
> https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Reply via email to