Re: [PR] proof of concept: unify refcounting logic [arrow-go]
zeroshade commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3720965234 Hey @pixelherodev are you still working on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573833846 > That's a good thing to check. My opinion here is the desire to make things as simple as possible and avoid having to deal with the dependency graph like that. Agreed. It's likely worth it even if it does; more overhead than this approach maybe, but less code / correctness concerns, and still a savings over the current implementation :) > In addition, the closure approach would also make the transition to add cleanup much easier Yeah, that's a really good point. We'd basically need two flags - the question is, is it one for "do any tracking" and one for "refcount vs addcleanup", or is it one for refcounting and one for addcleanup, and the combination is invalid? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
zeroshade commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573829496 That's a good thing to check. My opinion here is the desire to make things as simple as possible and avoid having to deal with the dependency graph like that. In addition, the closure approach would also make the transition to add cleanup much easier -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573819862 > ``` > cleanup func() > ``` > > Using a cleanup function (or closure) may well be the best approach. I can play around with that instead and see if we like that more? It means not needing to manually track the graph at all, because the function is invoked on release time but defined at creation time; it can take a closure of the object itself, and check whether fields needs unreferenced without any of the typing shenanigans.. One worry I have: I'll need to check if the compiler can optimize out the _construction of the closure_, since it won't actually be used. The current approach generates zero code when disabled, I want to make sure we can maintain that property... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573677330 ``` cleanup func() ``` Using a cleanup function (or closure) may well be the best approach. I can play around with that instead and see if we like that more? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573675481 I'm just going to write up the design notes as one place to explain all the weirdness: - Looking over the existing code for Message/MessageReader, there's a few requirements: - Buffers get released when the last reference to them is erased. - When buffers get released, we also nil out the pointer to them. - When 1 object is released, we want to walk every object it references and release those too. - This is often done by _value at the time of Release_, not when the object is created, and includes a "if != nil" check. The object may not be created until _after the object is initialized_, if ever, and should be released regardless. e.g. `MessageReader.msg` is a `*Message`, and has a different value for each Message that is read. So, in trying to _separate_ the concerns of reference counting and object management, and trying to make it all declarative and only called on object initialization - keeping the refcount graph _static_, even as the objects in it may be dynamic! - I settled on using two-level-pointers. The address of `MessageReader.msg` is fixed, even as its value may change - we want to release whatever the _final value is_ when MessageReader gets released, and we don't want to insert a lot more code dynamically shifting around the reference graph. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558389912
##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+ "sync/atomic"
+ "unsafe"
+
+ "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+ countatomic.Int64
+ dependencies []unsafe.Pointer
+ buffers []**Buffer
Review Comment:
I'll explain in a dedicated comment going over the whole design
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558388937
##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+ "sync/atomic"
+ "unsafe"
+
+ "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+ countatomic.Int64
+ dependencies []unsafe.Pointer
+ buffers []**Buffer
+ derived []unsafe.Pointer
+}
+
+// Must only be called once per object. Defines the dependency tree.
+// When this object is completely unreferenced, all dependencies will
+// be unreferenced by it and, if this was the only object still
+// referencing them, they will be freed as well, recursively.
+func (r *Refcount) ReferenceDependency(d ...unsafe.Pointer) {
+ r.dependencies = d
+}
+
+// Must only be called once per object. Defines buffers that are referenced
+// by this object. When this object is unreferenced, all such buffers will
+// be deallocated immediately.
+func (r *Refcount) ReferenceBuffer(b ...**Buffer) {
+ r.buffers = b
+}
Review Comment:
no, this is for the last call to `Refcount.Release()`: that call will
deallocate the buffers immediately - or, rather, it will call Allocator.Free()
immediately. IMO these should be seen as the same thing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558378255
##
arrow/ipc/message.go:
##
@@ -85,7 +84,9 @@ func NewMessage(meta, body *memory.Buffer) *Message {
meta: meta,
body: body,
}
- m.refCount.Add(1)
+ m.ReferenceBuffer(&m.meta, &m.body)
+ m.ReferenceDerived(unsafe.Pointer(&m.msg))
Review Comment:
The msg pointer is _derived_ from the `meta` Buffer, which is itself
allocated by the `Allocator`. `ReferenceDerived` is not for refcounting; it
basically means "when this object goes free, nil out the target to prevent
use-after-free."
Without it, accessing the message after calling Release could potentially
access other memory allocated by the `Allocator` - if integrating with C, this
could _theoretically_ allow access to other heap allocations. Note that the
previous code for Message.Release contains this:
```go
if msg.refCount.Add(-1) == 0 {
msg.meta.Release()
msg.body.Release()
msg.msg = nil
msg.meta = nil
msg.body = nil
}
```
it's releasing the two subresources, but also nilling the one that depends
on one. This is just to replace the `msg = nil` line, basically.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
zeroshade commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558108594
##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+ "sync/atomic"
+ "unsafe"
+
+ "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+ countatomic.Int64
+ dependencies []unsafe.Pointer
+ buffers []**Buffer
+ derived []unsafe.Pointer
+}
+
+// Must only be called once per object. Defines the dependency tree.
+// When this object is completely unreferenced, all dependencies will
+// be unreferenced by it and, if this was the only object still
+// referencing them, they will be freed as well, recursively.
+func (r *Refcount) ReferenceDependency(d ...unsafe.Pointer) {
+ r.dependencies = d
+}
+
+// Must only be called once per object. Defines buffers that are referenced
+// by this object. When this object is unreferenced, all such buffers will
+// be deallocated immediately.
+func (r *Refcount) ReferenceBuffer(b ...**Buffer) {
+ r.buffers = b
+}
Review Comment:
you mean "When this object is cleaned up" or "released"? Because Go is
garbage collected and we don't have destructors, then just being `unreferenced`
won't deallocate the buffers immediately, they'll get deallocated when the GC
gets around to it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
zeroshade commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558106891
##
arrow/memory/refcount.go:
##
@@ -0,0 +1,75 @@
+//go:build !norc
+
+package memory
+
+import (
+ "sync/atomic"
+ "unsafe"
+
+ "github.com/apache/arrow-go/v18/arrow/internal/debug"
+)
+
+type Refcount struct {
+ countatomic.Int64
+ dependencies []unsafe.Pointer
+ buffers []**Buffer
Review Comment:
why `**Buffer` instead of `*Buffer`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
zeroshade commented on code in PR #578:
URL: https://github.com/apache/arrow-go/pull/578#discussion_r2558104886
##
arrow/ipc/message.go:
##
@@ -85,7 +84,9 @@ func NewMessage(meta, body *memory.Buffer) *Message {
meta: meta,
body: body,
}
- m.refCount.Add(1)
+ m.ReferenceBuffer(&m.meta, &m.body)
+ m.ReferenceDerived(unsafe.Pointer(&m.msg))
Review Comment:
we shouldn't need to track the `m.msg`, the point of the refcounting is to
track allocations made by the `memory.Allocator`, the flatbuffer message object
is never allocated by the memory.Allocator
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
zeroshade commented on PR #578:
URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3573245896
I'm not a fan of this approach. I dislike the users having to call
*multiple* functions as opposed to just having `Retain` and `Release` (i.e. I
don't like having to call the `Referenced` and keep track of Derived etc.)
In theory, a buffer need only keep track of its parent (if it has one) and
doesn't need to keep track of any other buffers which were sliced off from it.
I don't understand the need/desire for adding the pointers that you're doing as
opposed to just having the `atomic.Int64` and essentially putting that behind a
struct which can have a version that is empty for turning off the refcounts.
i.e. Why isn't it just something like:
```go
//go:build !norc
type RefCount struct {
ref atomic.Int64
cleanup func()
}
func (r *RefCount) Retain() {
...
}
func (r *RefCount) Release() {
...
}
/
//go:build norc
type RefCount struct {}
func (*RefCount) Retain() {}
func (*RefCount) Release() {}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3572692696 `go fmt`ed. Forgot :upside_down_face: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3569618375 There's probably a way to replace the unsafe.Pointer in storage for dependencies with **Refcount, at least. I'm not sure I like this approach; it forces dependencies to have a pointer stored, which effectively forces heap escapes, which I've been trying to _avoid_. Might be worth splitting optional/mutable/dynamic dependencies (which need to be able to have the pointer change) and immutable/static dependencies, waiting on review before I make any more changes though -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3565021921 And, tested with MessageReader dependency on Message. Test passes, and if the explicit dependency is dropped, the test fails due to a memory leak. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564928399 And, tested and confirmed: this prototype completely drops refcounting from the `Message` type when the `refcount` build flag is not enabled, including making it zero-size so that the Go heap shrinks a bit too :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564901488 In principle, doing this and then moving Refcount.(Retain|Release) into a disabled file is enough to switch off refcounting. I'd also like to see if we can move the Refcount initialization into an inline-able function call, though - that would preserve this behavior when it's enabled, but also drop the cost of initializing the refcount information when it's not needed... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564869570 Tested that the derived pointers are nilled correctly by modifying the MessageReader test, definitely is working. It's also guaranteed to be safe: > (1) Conversion of a *T1 to Pointer to *T2. > > Provided that T2 is no larger than T1 and that the two share an equivalent > memory layout, this conversion allows reinterpreting data of one type as > data of another type. An example is the implementation of math.Float64bits: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564750631 ... ah wait > The usage of Additional for nilling derived pointers is because those can be of arbitrary types, and I didn't want to try engaging in any or unsafe.Pointer shenanigans to construct the equivalent to a C void**. ... should be able to recast them all as *uintptr with unsafe.Pointer, and then store it as []*uintptr? :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] proof of concept: unify refcounting logic [arrow-go]
pixelherodev commented on PR #578: URL: https://github.com/apache/arrow-go/pull/578#issuecomment-3564696796 Tests are passing. The usage of `Additional` for nilling derived pointers is because those can be of arbitrary types, and I didn't want to try engaging in `any` or `unsafe.Pointer` shenanigans to construct the equivalent to a C `void**`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
