Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2026-01-07 Thread via GitHub


zeroshade commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3720965234

   Hey @pixelherodev are you still working on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573833846

   > That's a good thing to check. My opinion here is the desire to make things 
as simple as possible and avoid having to deal with the dependency graph like 
that.
   
   Agreed. It's likely worth it even if it does; more overhead than this 
approach maybe, but less code / correctness concerns, and still a savings over 
the current implementation :)
   
   > In addition, the closure approach would also make the transition to add 
cleanup much easier
   
   Yeah, that's a really good point. We'd basically need two flags - the 
question is, is it one for "do any tracking" and one for "refcount vs 
addcleanup", or is it one for refcounting and one for addcleanup, and the 
combination is invalid?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


zeroshade commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573829496

   That's a good thing to check. My opinion here is the desire to make things 
as simple as possible and avoid having to deal with the dependency graph like 
that. 
   
   In addition, the closure approach would also make the transition to add 
cleanup much easier


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573819862

   > ```
   > cleanup func()
   > ```
   > 
   > Using a cleanup function (or closure) may well be the best approach. I can 
play around with that instead and see if we like that more? It means not 
needing to manually track the graph at all, because the function is invoked on 
release time but defined at creation time; it can take a closure of the object 
itself, and check whether fields needs unreferenced without any of the typing 
shenanigans..
   
   One worry I have: I'll need to check if the compiler can optimize out the 
_construction of the closure_, since it won't actually be used. The current 
approach generates zero code when disabled, I want to make sure we can maintain 
that property...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573677330

   ```
   cleanup func()
   ```
   
   Using a cleanup function (or closure) may well be the best approach. I can 
play around with that instead and see if we like that more?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573675481

   I'm just going to write up the design notes as one place to explain all the 
weirdness:
   
   - Looking over the existing code for Message/MessageReader, there's a few 
requirements:
 - Buffers get released when the last reference to them is erased.
 - When buffers get released, we also nil out the pointer to them.
 - When 1 object is released, we want to walk every object it references 
and release those too.
   - This is often done by _value at the time of Release_, not when the 
object is created, and includes a "if != nil" check. The object may not be 
created until _after the object is initialized_, if ever, and should be 
released regardless. e.g. `MessageReader.msg` is a `*Message`, and has a 
different value for each Message that is read.
   
   So, in trying to _separate_ the concerns of reference counting and object 
management, and trying to make it all declarative and only called on object 
initialization - keeping the refcount graph _static_, even as the objects in it 
may be dynamic! - I settled on using two-level-pointers. The address of 
`MessageReader.msg` is fixed, even as its value may change - we want to release 
whatever the _final value is_ when MessageReader gets released, and we don't 
want to insert a lot more code dynamically shifting around the reference graph.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558389912


##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+   "sync/atomic"
+   "unsafe"
+
+   "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+   countatomic.Int64
+   dependencies []unsafe.Pointer
+   buffers  []**Buffer

Review Comment:
   I'll explain in a dedicated comment going over the whole design



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558388937


##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+   "sync/atomic"
+   "unsafe"
+
+   "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+   countatomic.Int64
+   dependencies []unsafe.Pointer
+   buffers  []**Buffer
+   derived  []unsafe.Pointer
+}
+
+// Must only be called once per object. Defines the dependency tree.
+// When this object is completely unreferenced, all dependencies will
+// be unreferenced by it and, if this was the only object still
+// referencing them, they will be freed as well, recursively.
+func (r *Refcount) ReferenceDependency(d ...unsafe.Pointer) {
+   r.dependencies = d
+}
+
+// Must only be called once per object. Defines buffers that are referenced
+// by this object. When this object is unreferenced, all such buffers will
+// be deallocated immediately.
+func (r *Refcount) ReferenceBuffer(b ...**Buffer) {
+   r.buffers = b
+}

Review Comment:
   no, this is for the last call to `Refcount.Release()`: that call will 
deallocate the buffers immediately - or, rather, it will call Allocator.Free() 
immediately. IMO these should be seen as the same thing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558378255


##
arrow/ipc/message.go:
##
@@ -85,7 +84,9 @@ func NewMessage(meta, body *memory.Buffer) *Message {
meta: meta,
body: body,
}
-   m.refCount.Add(1)
+   m.ReferenceBuffer(&m.meta, &m.body)
+   m.ReferenceDerived(unsafe.Pointer(&m.msg))

Review Comment:
   The msg pointer is _derived_ from the `meta` Buffer, which is itself 
allocated by the `Allocator`. `ReferenceDerived` is not for refcounting; it 
basically means "when this object goes free, nil out the target to prevent 
use-after-free."
   
   Without it, accessing the message after calling Release could potentially 
access other memory allocated by the `Allocator` - if integrating with C, this 
could _theoretically_ allow access to other heap allocations. Note that the 
previous code for Message.Release contains this:
   
   ```go
   if msg.refCount.Add(-1) == 0 {
msg.meta.Release()
msg.body.Release()
msg.msg = nil
msg.meta = nil
msg.body = nil
}
   ```
   
   it's releasing the two subresources, but also nilling the one that depends 
on one. This is just to replace the `msg = nil` line, basically.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


zeroshade commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558108594


##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+   "sync/atomic"
+   "unsafe"
+
+   "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+   countatomic.Int64
+   dependencies []unsafe.Pointer
+   buffers  []**Buffer
+   derived  []unsafe.Pointer
+}
+
+// Must only be called once per object. Defines the dependency tree.
+// When this object is completely unreferenced, all dependencies will
+// be unreferenced by it and, if this was the only object still
+// referencing them, they will be freed as well, recursively.
+func (r *Refcount) ReferenceDependency(d ...unsafe.Pointer) {
+   r.dependencies = d
+}
+
+// Must only be called once per object. Defines buffers that are referenced
+// by this object. When this object is unreferenced, all such buffers will
+// be deallocated immediately.
+func (r *Refcount) ReferenceBuffer(b ...**Buffer) {
+   r.buffers = b
+}

Review Comment:
   you mean "When this object is cleaned up" or "released"? Because Go is 
garbage collected and we don't have destructors, then just being `unreferenced` 
won't deallocate the buffers immediately, they'll get deallocated when the GC 
gets around to it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


zeroshade commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558106891


##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+   "sync/atomic"
+   "unsafe"
+
+   "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+   countatomic.Int64
+   dependencies []unsafe.Pointer
+   buffers  []**Buffer

Review Comment:
   why `**Buffer` instead of `*Buffer`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


zeroshade commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558104886


##
arrow/ipc/message.go:
##
@@ -85,7 +84,9 @@ func NewMessage(meta, body *memory.Buffer) *Message {
meta: meta,
body: body,
}
-   m.refCount.Add(1)
+   m.ReferenceBuffer(&m.meta, &m.body)
+   m.ReferenceDerived(unsafe.Pointer(&m.msg))

Review Comment:
   we shouldn't need to track the `m.msg`, the point of the refcounting is to 
track allocations made by the `memory.Allocator`, the flatbuffer message object 
is never allocated by the memory.Allocator



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


zeroshade commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573245896

   I'm not a fan of this approach. I dislike the users having to call 
*multiple* functions as opposed to just having `Retain` and `Release` (i.e. I 
don't like having to call the `Referenced` and keep track of Derived etc.)
   
   In theory, a buffer need only keep track of its parent (if it has one) and 
doesn't need to keep track of any other buffers which were sliced off from it. 
I don't understand the need/desire for adding the pointers that you're doing as 
opposed to just having the `atomic.Int64` and essentially putting that behind a 
struct which can have a version that is empty for turning off the refcounts.
   
   i.e. Why isn't it just something like:
   
   ```go
   //go:build !norc
   
   type RefCount struct {
   ref atomic.Int64
   cleanup func()
   }
   
   func (r *RefCount) Retain() {
   ...
   }
   
   func (r *RefCount) Release() {
   ...
   }
   
   /
   //go:build norc
   
   type RefCount struct {}
   func (*RefCount) Retain() {}
   func (*RefCount) Release() {}
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3572692696

   `go fmt`ed. Forgot :upside_down_face: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-24 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3569618375

   There's probably a way to replace the unsafe.Pointer in storage for 
dependencies with **Refcount, at least. I'm not sure I like this approach; it 
forces dependencies to have a pointer stored, which effectively forces heap 
escapes, which I've been trying to _avoid_.
   
   Might be worth splitting optional/mutable/dynamic dependencies (which need 
to be able to have the pointer change) and immutable/static dependencies, 
waiting on review before I make any more changes though


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-21 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3565021921

   And, tested with MessageReader dependency on Message. Test passes, and if 
the explicit dependency is dropped, the test fails due to a memory leak.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-21 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564928399

   And, tested and confirmed: this prototype completely drops refcounting from 
the `Message` type when the `refcount` build flag is not enabled, including 
making it zero-size so that the Go heap shrinks a bit too :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-21 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564901488

   In principle, doing this and then moving Refcount.(Retain|Release) into a 
disabled file is enough to switch off refcounting.
   
   I'd also like to see if we can move the Refcount initialization into an 
inline-able function call, though - that would preserve this behavior when it's 
enabled, but also drop the cost of initializing the refcount information when 
it's not needed...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-21 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564869570

   Tested that the derived pointers are nilled correctly by modifying the 
MessageReader test, definitely is working. 
   
   It's also guaranteed to be safe:
   
   >   (1) Conversion of a *T1 to Pointer to *T2.
   > 
   > Provided that T2 is no larger than T1 and that the two share an 
equivalent
   > memory layout, this conversion allows reinterpreting data of one type 
as
   > data of another type. An example is the implementation of 
math.Float64bits:
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-21 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564750631

   ... ah wait
   
   > The usage of Additional for nilling derived pointers is because those can 
be of arbitrary types, and I didn't want to try engaging in any or 
unsafe.Pointer shenanigans to construct the equivalent to a C void**.
   
   ... should be able to recast them all as *uintptr with unsafe.Pointer, and 
then store it as []*uintptr? :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] proof of concept: unify refcounting logic [arrow-go]

2025-11-21 Thread via GitHub


pixelherodev commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564696796

   Tests are passing.
   
   The usage of `Additional` for nilling derived pointers is because those can 
be of arbitrary types, and I didn't want to try engaging in `any` or 
`unsafe.Pointer` shenanigans to construct the equivalent to a C `void**`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]