Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-28 Thread Jonas Oberhauser

Thanks for your response,

Am 9/28/2024 um 1:33 PM schrieb Mathieu Desnoyers:


This is a userspace prototype. This will behave similarly to a userspace
spinlock in that case, which is not great in terms of CPU usage, but
should eventually unblock the waiter, unless it has a RT priority that
really prevents any progress from the emergency slot owner.

On my TODO list, I have a bullet about integrating with sys_futex to
block on wait, wake up on slot release. I would then use the wait/wakeup
code based on sys_futex already present in liburcu.


Oh, I see. I think if it is for userspace then it really should use 
wait/wakeup as you said.


Best wishes,
  jonas




Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-28 Thread Mathieu Desnoyers

On 2024-09-28 13:22, Jonas Oberhauser wrote:

Two more questions below:

Am 9/21/2024 um 6:42 PM schrieb Mathieu Desnoyers:

+#define NR_PERCPU_SLOTS_BITS    3


Have you measured any advantage of this multi-slot version vs a version 
with just one normal slot and one emergency slot?


No, I have not. That being said, I am taking a minimalistic
approach that takes things even further in the "simple" direction
for what I will send as RFC against the Linux kernel:
there is just the one "emergency slot", irqs are disabled around
use of the HP slot, and promotion to refcount is done before
returning to the caller.

With just one normal slot, the normal slot version would always be zero, 
and there'd be no need to increment etc., which might make the common 
case (no conflict) faster.


The multi-slots allows preemption while holding the slot. It also allows
HP slots users to keep it longer without doing to refcount right away.

I even have a patch that dynamically adapts the scan depth (increase
by reader, decrease by synchronize, with an hysteresis) in my userspace
prototype. This splits the number of allocated slots from the scan
depth. But I keep that for later and will focus on the simple case
first (single HP slot, only used with irqoff).



Either way I recommend stress testing with just one normal slot to 
increase the chance of conflict (and hence triggering corner cases) 
during stress testing.


Good point.




+retry:
+    node = uatomic_load(node_p, CMM_RELAXED);
+    if (!node)
+    return false;
+    /* Use rseq to try setting current slot hp. Store B. */
+    if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
+    (intptr_t *) &slot->node, (intptr_t) NULL,
+    (intptr_t) node, cpu)) {
+    slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];
+    use_refcount = true;
+    /*
+ * This may busy-wait for another reader using the
+ * emergency slot to transition to refcount.
+ */
+    caa_cpu_relax();
+    goto retry;
+    }


I'm not familiar with Linux' preemption model. Can this deadlock if a 
low-interrupt-level thread is occupying the EMERGENCY slot and a 
higher-interrupt-level thread is also trying to take it?


This is a userspace prototype. This will behave similarly to a userspace
spinlock in that case, which is not great in terms of CPU usage, but
should eventually unblock the waiter, unless it has a RT priority that
really prevents any progress from the emergency slot owner.

On my TODO list, I have a bullet about integrating with sys_futex to
block on wait, wake up on slot release. I would then use the wait/wakeup
code based on sys_futex already present in liburcu.

Thanks!

Mathieu







Best wishes,
   jonas



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-28 Thread Jonas Oberhauser

Two more questions below:

Am 9/21/2024 um 6:42 PM schrieb Mathieu Desnoyers:

+#define NR_PERCPU_SLOTS_BITS   3


Have you measured any advantage of this multi-slot version vs a version 
with just one normal slot and one emergency slot?
With just one normal slot, the normal slot version would always be zero, 
and there'd be no need to increment etc., which might make the common 
case (no conflict) faster.


Either way I recommend stress testing with just one normal slot to 
increase the chance of conflict (and hence triggering corner cases) 
during stress testing.



+retry:
+   node = uatomic_load(node_p, CMM_RELAXED);
+   if (!node)
+   return false;
+   /* Use rseq to try setting current slot hp. Store B. */
+   if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
+   (intptr_t *) &slot->node, (intptr_t) NULL,
+   (intptr_t) node, cpu)) {
+   slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];
+   use_refcount = true;
+   /*
+* This may busy-wait for another reader using the
+* emergency slot to transition to refcount.
+*/
+   caa_cpu_relax();
+   goto retry;
+   }


I'm not familiar with Linux' preemption model. Can this deadlock if a 
low-interrupt-level thread is occupying the EMERGENCY slot and a 
higher-interrupt-level thread is also trying to take it?





Best wishes,
  jonas




Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-25 Thread Jonas Oberhauser




Am 9/25/2024 um 1:36 PM schrieb Mathieu Desnoyers:

On 2024-09-25 12:06, Jonas Oberhauser wrote:



Am 9/25/2024 um 8:35 AM schrieb Mathieu Desnoyers:

On 2024-09-25 07:57, Jonas Oberhauser wrote:

Hi Mathieu,


I haven't read your code in detail but it seems to me you have an 
ABA bug: as I explained elsewhere, you could read the same pointer 
after ABA but you don't synchronize with the newer store that gave 
you node2, leaving you to speculatively read stale values through 
*ctx->hp.
(I am assuming here that ctx->hp is essentially an out parameter 
used to let the caller know which node got protected).


The following change should fix it:

  cmm_barrier();
-    node2 = uatomic_load(node_p, CMM_RELAXED);    /* Load A */
+    node2 = rcu_dereference(*node_p);    /* Load A */



I don't think this fixes it, because IIRC rcu_dereference relies on 
the address dependency (which we don't have here) to provide ordering.


I would recommend either:

-    ctx->hp = node;
+    ctx->hp = node2;

which fixes the problem under the perhaps too weak assumption that the 
compiler doesn't use its knowledge that node==node2 to just undo this 
fix, or more strictly,


As stated in Documentation/RCU/rcu_dereference.rst from the Linux
kernel, comparing the result of rcu_dereference against another
non-NULL pointer is discouraged, as you rightly point out.



+    ctx->hp = READ_ONCE(node2);

which I believe makes sure that the value of node2 is used.


I am not entirely sure this extra READ_ONCE() would be sufficient
to prevent the compiler from making assumptions about the content
of node2 and thus use the result of the first load (node) instead.
It would also not suffice to prevent the CPU from speculatively
using the result of the first load to perform dependent loads AFAIU.


The reason I think it should be sufficient is that it forces the 
compiler to assume that since the comparison, the value of node2 might 
have changed.
Therefore, simply using the value of node at that point should be 
unsound from the compiler's POV.


But I'm not a compiler expert... So I definitely support uneasiness 
about this construct :))


jonas




Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-25 Thread Mathieu Desnoyers

On 2024-09-25 12:06, Jonas Oberhauser wrote:



Am 9/25/2024 um 8:35 AM schrieb Mathieu Desnoyers:

On 2024-09-25 07:57, Jonas Oberhauser wrote:

Hi Mathieu,


I haven't read your code in detail but it seems to me you have an ABA 
bug: as I explained elsewhere, you could read the same pointer after 
ABA but you don't synchronize with the newer store that gave you 
node2, leaving you to speculatively read stale values through *ctx->hp.
(I am assuming here that ctx->hp is essentially an out parameter used 
to let the caller know which node got protected).


The following change should fix it:

  cmm_barrier();
-    node2 = uatomic_load(node_p, CMM_RELAXED);    /* Load A */
+    node2 = rcu_dereference(*node_p);    /* Load A */



I don't think this fixes it, because IIRC rcu_dereference relies on the 
address dependency (which we don't have here) to provide ordering.


I would recommend either:

-    ctx->hp = node;
+    ctx->hp = node2;

which fixes the problem under the perhaps too weak assumption that the 
compiler doesn't use its knowledge that node==node2 to just undo this 
fix, or more strictly,


As stated in Documentation/RCU/rcu_dereference.rst from the Linux
kernel, comparing the result of rcu_dereference against another
non-NULL pointer is discouraged, as you rightly point out.



+    ctx->hp = READ_ONCE(node2);

which I believe makes sure that the value of node2 is used.


I am not entirely sure this extra READ_ONCE() would be sufficient
to prevent the compiler from making assumptions about the content
of node2 and thus use the result of the first load (node) instead.
It would also not suffice to prevent the CPU from speculatively
using the result of the first load to perform dependent loads AFAIU.


Alternatively you could always use an acquire load.


Unless someone comes up with a sound alternate approach,
I am tempted to go with an acquire load as the second load
within hpref_hp_get().

This way, the compiler would not attempt to use the
node value from the first load for dependent loads,
and the and CPU won't try to speculate dependent loads
either.

Thanks,

Mathieu




Best wishes,

   jonas



--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-25 Thread Jonas Oberhauser




Am 9/25/2024 um 8:35 AM schrieb Mathieu Desnoyers:

On 2024-09-25 07:57, Jonas Oberhauser wrote:

Hi Mathieu,


I haven't read your code in detail but it seems to me you have an ABA 
bug: as I explained elsewhere, you could read the same pointer after 
ABA but you don't synchronize with the newer store that gave you 
node2, leaving you to speculatively read stale values through *ctx->hp.
(I am assuming here that ctx->hp is essentially an out parameter used 
to let the caller know which node got protected).


The following change should fix it:

  cmm_barrier();
-    node2 = uatomic_load(node_p, CMM_RELAXED);    /* Load A */
+    node2 = rcu_dereference(*node_p);    /* Load A */



I don't think this fixes it, because IIRC rcu_dereference relies on the 
address dependency (which we don't have here) to provide ordering.


I would recommend either:

-ctx->hp = node;
+ctx->hp = node2;

which fixes the problem under the perhaps too weak assumption that the 
compiler doesn't use its knowledge that node==node2 to just undo this 
fix, or more strictly,


+ctx->hp = READ_ONCE(node2);

which I believe makes sure that the value of node2 is used.

Alternatively you could always use an acquire load.


Best wishes,

  jonas




Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-24 Thread Mathieu Desnoyers

On 2024-09-25 07:57, Jonas Oberhauser wrote:

Hi Mathieu,

interesting idea. Conceptually it looks good.

There's another approach of using hazard pointer to optimize shared 
reference counting (to make it lock-free in common cases).


https://github.com/cmuparlay/concurrent_deferred_rc

It doesn't go as far as what you're doing, but they also use the hazard 
pointer to protect the refcount (like you do in the promote function).


Thanks for the reference. Indeed, one use-case of this HP+refcount
scheme would be to implement something that resembles C++ shared
pointers. I'm glad to see I'm not alone in seeing potential in this
approach.

I haven't read your code in detail but it seems to me you have an ABA 
bug: as I explained elsewhere, you could read the same pointer after ABA 
but you don't synchronize with the newer store that gave you node2, 
leaving you to speculatively read stale values through *ctx->hp.
(I am assuming here that ctx->hp is essentially an out parameter used to 
let the caller know which node got protected).


You are of course absolutely right. The following change should fix it:

cmm_barrier();
-   node2 = uatomic_load(node_p, CMM_RELAXED);  /* Load A */
+   node2 = rcu_dereference(*node_p);   /* Load A */

Please let me know if I'm missing anything.

Thanks,

Mathieu




Have fun,
   jonas


+/*
+ * hpref_hp_get: Obtain a reference to a stable object, protected either
+ *   by hazard pointer (fast-path) or using reference
+ *   counter as fall-back.
+ */
+static inline
+bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx)
+{
+    int cpu = rseq_current_cpu_raw();
+    struct hpref_percpu_slots *cpu_slots = 
rseq_percpu_ptr(hpref_percpu_slots, cpu);
+    struct hpref_slot *slot = 
&cpu_slots->slots[cpu_slots->current_slot];

+    bool use_refcount = false;
+    struct hpref_node *node, *node2;
+    unsigned int next_slot;
+
+retry:
+    node = uatomic_load(node_p, CMM_RELAXED);
+    if (!node)
+    return false;
+    /* Use rseq to try setting current slot hp. Store B. */
+    if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
+    (intptr_t *) &slot->node, (intptr_t) NULL,
+    (intptr_t) node, cpu)) {
+    slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];
+    use_refcount = true;
+    /*
+ * This may busy-wait for another reader using the
+ * emergency slot to transition to refcount.
+ */
+    caa_cpu_relax();
+    goto retry;
+    }
+    /* Memory ordering: Store B before Load A. */
+    cmm_smp_mb();
+    node2 = uatomic_load(node_p, CMM_RELAXED);    /* Load A */
+    if (node != node2) {
+    uatomic_store(&slot->node, NULL, CMM_RELAXED);
+    if (!node2)
+    return false;
+    goto retry;
+    }
+    ctx->type = HPREF_TYPE_HP;
+    ctx->hp = node;

    ^ here


+    ctx->slot = slot;
+    if (use_refcount) {
+    hpref_promote_hp_to_ref(ctx);
+    return true;
+    }
+    /*
+ * Increment current slot (racy increment is OK because it is
+ * just a position hint). Skip the emergency slot.
+ */
+    next_slot = uatomic_load(&cpu_slots->current_slot, CMM_RELAXED) + 1;
+    if (next_slot >= HPREF_EMERGENCY_SLOT)
+    next_slot = 0;
+    uatomic_store(&cpu_slots->current_slot, next_slot, CMM_RELAXED);
+    return true;
+}
+
+static inline
+void hpref_put(struct hpref_ctx *ctx)
+{
+    if (ctx->type == HPREF_TYPE_REF) {
+    urcu_ref_put(&ctx->hp->refcount, hpref_release);
+    } else {
+    /* Release HP. */
+    uatomic_store(&ctx->slot->node, NULL, CMM_RELEASE);
+    }
+    ctx->hp = NULL;
+}
+
+/*
+ * hpref_set_pointer: Store pointer @node to @ptr, with RCU publication
+ *    guarantees.
+ */
+static inline
+void hpref_set_pointer(struct hpref_node **ptr, struct hpref_node *node)
+{
+    if (__builtin_constant_p(node) && node == NULL)
+    uatomic_store(ptr, NULL, CMM_RELAXED);
+    else
+    uatomic_store(ptr, node, CMM_RELEASE);
+}
+
+/*
+ * Return the content of the hpref context hazard pointer field.
+ */
+static inline
+struct hpref_node *hpref_ctx_pointer(struct hpref_ctx *ctx)
+{
+    return ctx->hp;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _URCU_HPREF_H */
diff --git a/src/Makefile.am b/src/Makefile.am
index b555c818..7312c9f7 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -19,7 +19,8 @@ RCULFHASH = rculfhash.c rculfhash-mm-order.c 
rculfhash-mm-chunk.c \

  lib_LTLIBRARIES = liburcu-common.la \
  liburcu.la liburcu-qsbr.la \
  liburcu-mb.la liburcu-bp.la \
-    liburcu-memb.la liburcu-cds.la
+    liburcu-memb.la liburcu-cds.la \
+    liburcu-hpref.la
  #
  # liburcu-common contains wait-free queues (needed by call_rcu) as well
@@ -50,6 +51,9 @@ liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c 
lfstack.c \

  workqueue.c workqueue.h $(RCULFHASH) $(COMPAT)
  liburcu_cds_la_LI

Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-24 Thread Jonas Oberhauser

Hi Mathieu,

interesting idea. Conceptually it looks good.

There's another approach of using hazard pointer to optimize shared 
reference counting (to make it lock-free in common cases).


https://github.com/cmuparlay/concurrent_deferred_rc

It doesn't go as far as what you're doing, but they also use the hazard 
pointer to protect the refcount (like you do in the promote function).


I haven't read your code in detail but it seems to me you have an ABA 
bug: as I explained elsewhere, you could read the same pointer after ABA 
but you don't synchronize with the newer store that gave you node2, 
leaving you to speculatively read stale values through *ctx->hp.
(I am assuming here that ctx->hp is essentially an out parameter used to 
let the caller know which node got protected).


Have fun,
  jonas


+/*
+ * hpref_hp_get: Obtain a reference to a stable object, protected either
+ *   by hazard pointer (fast-path) or using reference
+ *   counter as fall-back.
+ */
+static inline
+bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx)
+{
+   int cpu = rseq_current_cpu_raw();
+   struct hpref_percpu_slots *cpu_slots = 
rseq_percpu_ptr(hpref_percpu_slots, cpu);
+   struct hpref_slot *slot = &cpu_slots->slots[cpu_slots->current_slot];
+   bool use_refcount = false;
+   struct hpref_node *node, *node2;
+   unsigned int next_slot;
+
+retry:
+   node = uatomic_load(node_p, CMM_RELAXED);
+   if (!node)
+   return false;
+   /* Use rseq to try setting current slot hp. Store B. */
+   if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
+   (intptr_t *) &slot->node, (intptr_t) NULL,
+   (intptr_t) node, cpu)) {
+   slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];
+   use_refcount = true;
+   /*
+* This may busy-wait for another reader using the
+* emergency slot to transition to refcount.
+*/
+   caa_cpu_relax();
+   goto retry;
+   }
+   /* Memory ordering: Store B before Load A. */
+   cmm_smp_mb();
+   node2 = uatomic_load(node_p, CMM_RELAXED);  /* Load A */
+   if (node != node2) {
+   uatomic_store(&slot->node, NULL, CMM_RELAXED);
+   if (!node2)
+   return false;
+   goto retry;
+   }
+   ctx->type = HPREF_TYPE_HP;
+   ctx->hp = node;

   ^ here


+   ctx->slot = slot;
+   if (use_refcount) {
+   hpref_promote_hp_to_ref(ctx);
+   return true;
+   }
+   /*
+* Increment current slot (racy increment is OK because it is
+* just a position hint). Skip the emergency slot.
+*/
+   next_slot = uatomic_load(&cpu_slots->current_slot, CMM_RELAXED) + 1;
+   if (next_slot >= HPREF_EMERGENCY_SLOT)
+   next_slot = 0;
+   uatomic_store(&cpu_slots->current_slot, next_slot, CMM_RELAXED);
+   return true;
+}
+
+static inline
+void hpref_put(struct hpref_ctx *ctx)
+{
+   if (ctx->type == HPREF_TYPE_REF) {
+   urcu_ref_put(&ctx->hp->refcount, hpref_release);
+   } else {
+   /* Release HP. */
+   uatomic_store(&ctx->slot->node, NULL, CMM_RELEASE);
+   }
+   ctx->hp = NULL;
+}
+
+/*
+ * hpref_set_pointer: Store pointer @node to @ptr, with RCU publication
+ *guarantees.
+ */
+static inline
+void hpref_set_pointer(struct hpref_node **ptr, struct hpref_node *node)
+{
+   if (__builtin_constant_p(node) && node == NULL)
+   uatomic_store(ptr, NULL, CMM_RELAXED);
+   else
+   uatomic_store(ptr, node, CMM_RELEASE);
+}
+
+/*
+ * Return the content of the hpref context hazard pointer field.
+ */
+static inline
+struct hpref_node *hpref_ctx_pointer(struct hpref_ctx *ctx)
+{
+   return ctx->hp;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _URCU_HPREF_H */
diff --git a/src/Makefile.am b/src/Makefile.am
index b555c818..7312c9f7 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -19,7 +19,8 @@ RCULFHASH = rculfhash.c rculfhash-mm-order.c 
rculfhash-mm-chunk.c \
  lib_LTLIBRARIES = liburcu-common.la \
liburcu.la liburcu-qsbr.la \
liburcu-mb.la liburcu-bp.la \
-   liburcu-memb.la liburcu-cds.la
+   liburcu-memb.la liburcu-cds.la \
+   liburcu-hpref.la
  
  #

  # liburcu-common contains wait-free queues (needed by call_rcu) as well
@@ -50,6 +51,9 @@ liburcu_cds_la_SOURCES = rculfqueue.c rculfstack.c lfstack.c \
workqueue.c workqueue.h $(RCULFHASH) $(COMPAT)
  liburcu_cds_la_LIBADD = liburcu-common.la
  
+liburcu_hpref_la_SOURCES = hpref.c

+liburcu_hpref_la_LIBADD = -lrseq
+
  pkgconfigdir = $(libdir)/pkgconfig
  pkgconfig_DATA = liburcu-cds.pc liburcu.pc liburcu-bp.pc liburcu-qsbr.pc \
libu

Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-22 Thread Mathieu Desnoyers

On 2024-09-21 23:07, Lai Jiangshan wrote:

+/*
+ * hpref_hp_get: Obtain a reference to a stable object, protected either
+ *   by hazard pointer (fast-path) or using reference
+ *   counter as fall-back.
+ */
+static inline
+bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx)
+{
+   int cpu = rseq_current_cpu_raw();
+   struct hpref_percpu_slots *cpu_slots = 
rseq_percpu_ptr(hpref_percpu_slots, cpu);
+   struct hpref_slot *slot = &cpu_slots->slots[cpu_slots->current_slot];
+   bool use_refcount = false;
+   struct hpref_node *node, *node2;
+   unsigned int next_slot;
+
+retry:
+   node = uatomic_load(node_p, CMM_RELAXED);
+   if (!node)
+   return false;
+   /* Use rseq to try setting current slot hp. Store B. */
+   if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
+   (intptr_t *) &slot->node, (intptr_t) NULL,
+   (intptr_t) node, cpu)) {
+   slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];


Can @cpu be possibly changed? if it can, it seems @cpu and @cpu_slots
should be updated first.


Indeed, if migration happens between rseq_current_cpu_raw() and
execution of rseq_load_cbne_store__ptr(), it will cause this
second operation to fail. This in turn could cause the reader
to retry for a long time (possibly forever) until it finally
migrates back to the original CPU. As you suggest, we should
re-load cpu and cpu_slots after failure here. More specifically,
we should re-load those when the rseq c.s. fails with -1, which
means it was abort or there was a cpu number mismatch. If the
rseq c.s. returns 1, this means the slot did not contain NULL,
so all we need to do is move over to the next slot.

While applying your suggested change, I noticed that I can
improve the fast-path by removing the notion of "current_slot"
number, and thus remove increment of this hint from the
fast path. The fast path will instead just scan all 8 slots
trying to find a NULL one. This also lessens the odds that
the fast-path must fallback to refcount in case of concurrent
use. It provides a 50% performance improvement for the fast
path with barrier/membarrier pairing.

I've updated the https://github.com/compudj/userspace-rcu-dev/tree/hpref
volatile feature branch with these changes. I'll give others some
time to provide feedback on the overall idea before sending out
an updated patch.

Thanks for your feedback!

Mathieu




+   use_refcount = true;
+   /*
+* This may busy-wait for another reader using the
+* emergency slot to transition to refcount.
+*/
+   caa_cpu_relax();
+   goto retry;
+   }


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com




Re: [RFC PATCH 1/1] hpref: Hazard Pointers with Reference Counter

2024-09-21 Thread Lai Jiangshan
> +/*
> + * hpref_hp_get: Obtain a reference to a stable object, protected either
> + *   by hazard pointer (fast-path) or using reference
> + *   counter as fall-back.
> + */
> +static inline
> +bool hpref_hp_get(struct hpref_node **node_p, struct hpref_ctx *ctx)
> +{
> +   int cpu = rseq_current_cpu_raw();
> +   struct hpref_percpu_slots *cpu_slots = 
> rseq_percpu_ptr(hpref_percpu_slots, cpu);
> +   struct hpref_slot *slot = &cpu_slots->slots[cpu_slots->current_slot];
> +   bool use_refcount = false;
> +   struct hpref_node *node, *node2;
> +   unsigned int next_slot;
> +
> +retry:
> +   node = uatomic_load(node_p, CMM_RELAXED);
> +   if (!node)
> +   return false;
> +   /* Use rseq to try setting current slot hp. Store B. */
> +   if (rseq_load_cbne_store__ptr(RSEQ_MO_RELAXED, RSEQ_PERCPU_CPU_ID,
> +   (intptr_t *) &slot->node, (intptr_t) NULL,
> +   (intptr_t) node, cpu)) {
> +   slot = &cpu_slots->slots[HPREF_EMERGENCY_SLOT];

Can @cpu be possibly changed? if it can, it seems @cpu and @cpu_slots
should be updated first.

> +   use_refcount = true;
> +   /*
> +* This may busy-wait for another reader using the
> +* emergency slot to transition to refcount.
> +*/
> +   caa_cpu_relax();
> +   goto retry;
> +   }