I don't think the new implementation actually improves the situation. The case for multiple producers is still not handled correctly. We need to cleanup this whole thing. See my comments below for suggestions how to proceed. CC'ing Sven
for additional feedback.

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue-inl.h
File src/circular-queue-inl.h (right):

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue-inl.h#newcode50
src/circular-queue-inl.h:50: }
This assumes that there is always only a single producer, but this is
not always the case, for example in Chrome on Linux with profiled web
worker threads, and maybe some other scenarios. We should get this right
now while we're at it.

I'd suggest to use a tri-state scheme with kEmpty, kPreparing and kFull,
and turn the enqueue_pos into an AtomicWord. Then the code would look
like this:

T* StartEnqueue() {
  Atomic32 curr = Acquire_Load(&enqueue_pos_);
  if (!Acquire_CompareAndSwap(&buffer_[curr + kRecordStateOffset],
kEmpty, kPreparing) != kEmpty) return NULL;
  Atomic32 next = (curr + kRecordSize) % (kRecordSize * N);
  ASSERT(Release_Load(&enqueue_pos_) == curr);
  Release_Store(&enqueue_pos_, next);
  return reinterpret_cast<T*>(&buffer_[curr]);
}

void FinishEnqueue(T* obj) {
  ASSERT(reinterpret_cast<Atomic32*>(obj) >= &buffer_[0] &&
              reinterpret_cast<Atomic32*>(obj) < &buffer_[kRecordSize *
N]);
  Atomic32* state_ptr = reinterpret_cast<Atomic32*>(obj) +
kRecordStateOffset;
  ASSERT(Release_Load(state_ptr) == kPreparing);
  Release_Store(state_ptr, kFull);
}

This way the objects are aligned on cache line boundaries, which is
probably better performancewise (guessing). If that turns out to be
wrong, put the state before the object again.

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue.cc
File src/circular-queue.cc (right):

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue.cc#newcode81
src/circular-queue.cc:81: MemoryBarrier();
Why do we need this memory barrier here?

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue.cc#newcode94
src/circular-queue.cc:94: }
T* StartDequeue() {
  if (Acquire_Load(&buffer_[dequeue_pos_ + kRecordStateOffset] != kFull)
return NULL;
  return reinterpret_cast<T*>(&buffer_[dequeue_pos_]);
}

void FinishDequeue() {
  ASSERT(Release_Load(&buffer_[dequeue_pos_ + kRecordStateOffset] ==
kFull);
  Release_Store(&buffer_[dequeue_pos_ + kRecordStateOffset], kEmpty);
  dequeue_pos_ = (dequeue_pos + kRecordSize) % (N * kRecordSize);
}

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue.h
File src/circular-queue.h (right):

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue.h#newcode41
src/circular-queue.h:41: class SamplingCircularQueue {
While we're already cleaning up this class, let's turn it into a
template with template parameters for the record type T and the constant
number N of records, i.e.

template <typename T, unsigned N>
class SamplingCircularQueue {
...
};

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue.h#newcode63
src/circular-queue.h:63: typedef AtomicWord Cell;
Get rid of this typedef, it's just confusing. Also there's no need to
use AtomicWord, and Atomic32 is sufficient.

https://codereview.chromium.org/22849002/diff/2001/src/circular-queue.h#newcode88
src/circular-queue.h:88: ConsumerPosition* consumer_pos_;
Let's also cleanup this mess. Assuming that the record size and number
of records is constant (see comment about templatizing), we should be
able to replace this with:

STATIC_ASSERT(kProcessorCacheLineSize % sizeof(Atomic32) == 0);
static const unsigned kRecordStateOffset = ROUND_UP(sizeof(T),
sizeof(Atomic32));
static const unsigned kRecordSize = ROUND_UP(kRecordStateOffset +
sizeof(Atomic32), kProcessorCacheLineSize) / sizeof(Atomic32);
Atomic32 buffer_[N * kRecordSize] ALIGNED_AT(kProcessorCacheLineSize);
// Allocate enqueue_pos_ and dequeue_pos_ to different cache lines.
Atomic32 enqueue_pos_ ALIGNED_AT(kProcessorCacheLineSize);
int dequeue_pos_ ALIGNED_AT(kProcessorCacheLineSize);

where ROUND_UP(size, alignment) is a macro version of RoundUp() and
ALIGNED_AT(x) is an appropriate macro that expands to
__attribute__((aligned(x))) for GCC, Clang & Co. or __declspec(align(x))
for MSVC.

https://codereview.chromium.org/22849002/diff/2001/src/cpu-profiler.cc
File src/cpu-profiler.cc (right):

https://codereview.chromium.org/22849002/diff/2001/src/cpu-profiler.cc#newcode122
src/cpu-profiler.cc:122: // feet.
We do no longer make a local copy here, so this comment is obsolete.

https://codereview.chromium.org/22849002/

--
--
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to