Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-03-06 Thread via GitHub


notfilippo commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-4010200136

   Sure! I think this is good to merge @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-03-04 Thread via GitHub


cetra3 commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-4001149913

   https://github.com/apache/arrow-rs/pull/9433 now has it not using `to_data` 
so keen to get some traction on this PR or my one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-25 Thread via GitHub


notfilippo commented on code in PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#discussion_r2852745850


##
arrow-array/src/array/mod.rs:
##
@@ -336,6 +336,75 @@ pub trait Array: std::fmt::Debug + Send + Sync {
 /// This value will always be greater than returned by 
`get_buffer_memory_size()` and
 /// includes the overhead of the data structures that contain the pointers 
to the various buffers.
 fn get_array_memory_size(&self) -> usize;
+
+/// Claim memory used by this array in the provided memory pool.
+///
+/// This recursively claims memory for:
+/// - All data buffers in this array
+/// - All child arrays (for nested types like List, Struct, etc.)
+/// - The null bitmap buffer if present
+///
+/// This method guarantees that the memory pool will only compute occupied 
memory
+/// exactly once. For example, if this array is derived from operations 
like `slice`,
+/// calling `claim` on it would not change the memory pool's usage if the 
underlying buffers
+/// are already counted before.
+///
+/// # Example
+/// ```
+/// # use arrow_array::{Int32Array, Array};
+/// # use arrow_buffer::TrackingMemoryPool;
+/// # use arrow_buffer::MemoryPool;
+///
+/// let pool = TrackingMemoryPool::default();
+///
+/// let small_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
+/// let small_array_size = small_array.get_buffer_memory_size();
+///
+/// // Claim the array's memory in the pool
+/// small_array.claim(&pool);
+///
+/// // Create and claim slices of `small_array`; should not increase 
memory usage
+/// let slice1 = small_array.slice(0, 2);
+/// let slice2 = small_array.slice(2, 2);
+/// slice1.claim(&pool);
+/// slice2.claim(&pool);
+///
+/// assert_eq!(pool.used(), small_array_size);
+///
+/// // Create a `large_array` which does not derive from the original 
`small_array`
+///
+/// let large_array = Int32Array::from((0..1000).collect::>());
+/// let large_array_size = large_array.get_buffer_memory_size();
+///
+/// large_array.claim(&pool);
+///
+/// // Trying to claim more than once is a no-op
+/// large_array.claim(&pool);
+/// large_array.claim(&pool);
+///
+/// assert_eq!(pool.used(), small_array_size + large_array_size);
+///
+/// let sum_of_all_sizes = small_array_size + large_array_size + 
slice1.get_buffer_memory_size() + slice2.get_buffer_memory_size();
+///
+/// // `get_buffer_memory_size` works independently of the memory pool, so 
a sum of all the
+/// // arrays in scope will always be >= the memory used reported by the 
memory pool.
+/// assert_ne!(pool.used(), sum_of_all_sizes);
+///
+/// // Until the final claim is dropped the buffer size remains accounted 
for
+/// drop(small_array);
+/// drop(slice1);
+///
+/// assert_eq!(pool.used(), small_array_size + large_array_size);
+///
+/// // Dropping this finally releases the buffer that was backing 
`small_array`
+/// drop(slice2);
+///
+/// assert_eq!(pool.used(), large_array_size);
+/// ```
+#[cfg(feature = "pool")]
+fn claim(&self, pool: &dyn arrow_buffer::MemoryPool) {
+self.to_data().claim(pool)

Review Comment:
   Here maybe I should add a comment stating that Array impl can provide a more 
optimised method. Wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-25 Thread via GitHub


notfilippo commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-3958815051

   Done! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-25 Thread via GitHub


notfilippo commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-3958424948

   I rebased the PR and I'll try to look into implementing `claim` for the 
Array variants. Could this be a follow-up PR or should I include it here? cc 
@alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-25 Thread via GitHub


notfilippo commented on code in PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#discussion_r2852283328


##
arrow-array/src/array/mod.rs:
##
@@ -336,6 +336,75 @@ pub trait Array: std::fmt::Debug + Send + Sync {
 /// This value will always be greater than returned by 
`get_buffer_memory_size()` and
 /// includes the overhead of the data structures that contain the pointers 
to the various buffers.
 fn get_array_memory_size(&self) -> usize;
+
+/// Claim memory used by this array in the provided memory pool.
+///
+/// This recursively claims memory for:
+/// - All data buffers in this array
+/// - All child arrays (for nested types like List, Struct, etc.)
+/// - The null bitmap buffer if present
+///
+/// This method guarantees that the memory pool will only compute occupied 
memory
+/// exactly once. For example, if this array is derived from operations 
like `slice`,
+/// calling `claim` on it would not change the memory pool's usage if the 
underlying buffers
+/// are already counted before.
+///
+/// # Example
+/// ```
+/// # use arrow_array::{Int32Array, Array};
+/// # use arrow_buffer::TrackingMemoryPool;
+/// # use arrow_buffer::MemoryPool;
+///
+/// let pool = TrackingMemoryPool::default();
+///
+/// let small_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
+/// let small_array_size = small_array.get_buffer_memory_size();
+///
+/// // Claim the array's memory in the pool
+/// small_array.claim(&pool);
+///
+/// // Create and claim slices of `small_array`; should not increase 
memory usage
+/// let slice1 = small_array.slice(0, 2);
+/// let slice2 = small_array.slice(2, 2);
+/// slice1.claim(&pool);
+/// slice2.claim(&pool);
+///
+/// assert_eq!(pool.used(), small_array_size);
+///
+/// // Create a `large_array` which does not derive from the original 
`small_array`
+///
+/// let large_array = Int32Array::from((0..1000).collect::>());
+/// let large_array_size = large_array.get_buffer_memory_size();
+///
+/// large_array.claim(&pool);
+///
+/// // Trying to claim more than once is a no-op
+/// large_array.claim(&pool);
+/// large_array.claim(&pool);
+///
+/// assert_eq!(pool.used(), small_array_size + large_array_size);
+///
+/// let sum_of_all_sizes = small_array_size + large_array_size + 
slice1.get_buffer_memory_size() + slice2.get_buffer_memory_size();
+///
+/// // `get_buffer_memory_size` works independently of the memory pool, so 
a sum of all the
+/// // arrays in scope will always be >= the memory used reported by the 
memory pool.
+/// assert_ne!(pool.used(), sum_of_all_sizes);
+///
+/// // Until the final claim is dropped the buffer size remains accounted 
for
+/// drop(small_array);
+/// drop(slice1);
+///
+/// assert_eq!(pool.used(), small_array_size + large_array_size);
+///
+/// // Dropping this finally releases the buffer that was backing 
`small_array`
+/// drop(slice2);
+///
+/// assert_eq!(pool.used(), large_array_size);
+/// ```
+#[cfg(feature = "pool")]
+fn claim(&self, pool: &dyn arrow_buffer::MemoryPool) {
+self.to_data().claim(pool)

Review Comment:
   Sounds good! Updating my PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-24 Thread via GitHub


cetra3 commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-3954810492

   I have another PR open which does similar things, and avoids the `to_data`: 
https://github.com/apache/arrow-rs/pull/9433


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-24 Thread via GitHub


alamb commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-3954629651

   Thanks @notfilippo  -- I am sorry for the delay in reviewing -- I am bummed 
no one else had a chance yet. 
   
   This API also came up today with @cetra3 and @adriangb in the context of 
DataFusion
   
   If we can avoid the call to `to_data` I think this PR will be good to go 
from my perspective


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-24 Thread via GitHub


alamb commented on code in PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#discussion_r2849444198


##
arrow-array/src/array/mod.rs:
##
@@ -336,6 +336,75 @@ pub trait Array: std::fmt::Debug + Send + Sync {
 /// This value will always be greater than returned by 
`get_buffer_memory_size()` and
 /// includes the overhead of the data structures that contain the pointers 
to the various buffers.
 fn get_array_memory_size(&self) -> usize;
+
+/// Claim memory used by this array in the provided memory pool.
+///
+/// This recursively claims memory for:
+/// - All data buffers in this array
+/// - All child arrays (for nested types like List, Struct, etc.)
+/// - The null bitmap buffer if present
+///
+/// This method guarantees that the memory pool will only compute occupied 
memory
+/// exactly once. For example, if this array is derived from operations 
like `slice`,
+/// calling `claim` on it would not change the memory pool's usage if the 
underlying buffers
+/// are already counted before.
+///
+/// # Example
+/// ```
+/// # use arrow_array::{Int32Array, Array};
+/// # use arrow_buffer::TrackingMemoryPool;
+/// # use arrow_buffer::MemoryPool;
+///
+/// let pool = TrackingMemoryPool::default();
+///
+/// let small_array = Int32Array::from(vec![1, 2, 3, 4, 5]);
+/// let small_array_size = small_array.get_buffer_memory_size();
+///
+/// // Claim the array's memory in the pool
+/// small_array.claim(&pool);
+///
+/// // Create and claim slices of `small_array`; should not increase 
memory usage
+/// let slice1 = small_array.slice(0, 2);
+/// let slice2 = small_array.slice(2, 2);
+/// slice1.claim(&pool);
+/// slice2.claim(&pool);
+///
+/// assert_eq!(pool.used(), small_array_size);
+///
+/// // Create a `large_array` which does not derive from the original 
`small_array`
+///
+/// let large_array = Int32Array::from((0..1000).collect::>());
+/// let large_array_size = large_array.get_buffer_memory_size();
+///
+/// large_array.claim(&pool);
+///
+/// // Trying to claim more than once is a no-op
+/// large_array.claim(&pool);
+/// large_array.claim(&pool);
+///
+/// assert_eq!(pool.used(), small_array_size + large_array_size);
+///
+/// let sum_of_all_sizes = small_array_size + large_array_size + 
slice1.get_buffer_memory_size() + slice2.get_buffer_memory_size();
+///
+/// // `get_buffer_memory_size` works independently of the memory pool, so 
a sum of all the
+/// // arrays in scope will always be >= the memory used reported by the 
memory pool.
+/// assert_ne!(pool.used(), sum_of_all_sizes);
+///
+/// // Until the final claim is dropped the buffer size remains accounted 
for
+/// drop(small_array);
+/// drop(slice1);
+///
+/// assert_eq!(pool.used(), small_array_size + large_array_size);
+///
+/// // Dropping this finally releases the buffer that was backing 
`small_array`
+/// drop(slice2);
+///
+/// assert_eq!(pool.used(), large_array_size);
+/// ```
+#[cfg(feature = "pool")]
+fn claim(&self, pool: &dyn arrow_buffer::MemoryPool) {
+self.to_data().claim(pool)

Review Comment:
   I think calling `to_data` allocates a new `Vec` to create the ArrayData (the 
`child` buffers specifically)
   
   Should we perhaps instead thread the `claim` API through the actual arrays / 
buffers?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-13 Thread via GitHub


alamb commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-3899768868

   Thanks @notfilippo  -- I don't have time to drive this type of change now 
(low level memory allocation tracking) -- I am overwhelmed by trying to keep 
things going. Hopefully some of the other committers will have time to help
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2026-02-13 Thread via GitHub


notfilippo commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-3897832811

   cc @alamb since you reviewed the original PR. cc @waynexia / @Dandandan as 
the original authors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2025-11-25 Thread via GitHub


notfilippo commented on PR #8918:
URL: https://github.com/apache/arrow-rs/pull/8918#issuecomment-3576429777

   cc @alamb since you reviewed the original PR. cc @waynexia / @Dandandan as 
the original authors.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



[PR] feat(memory-tracking): expose API to NullBuffer, ArrayData, and Array [arrow-rs]

2025-11-24 Thread via GitHub


notfilippo opened a new pull request, #8918:
URL: https://github.com/apache/arrow-rs/pull/8918

   # Which issue does this PR close?
   
   Part of #8137. Follow up of #7303. Replaces #8040.
   
   # Rationale for this change
   
   #7303 implements the fundamental symbols for tracking memory. This patch 
exposes those APIs to a higher level Array and ArrayData.
   
   # What changes are included in this PR?
   
   New `claim` API for NullBuffer, ArrayData, and Array. New `pool` 
feature-flag to arrow, arrow-array, and arrow-data.
   
   # Are these changes tested?
   
   Added a doctest on the `Array::claim` method.
   
   # Are there any user-facing changes?
   
   Added API and a new feature-flag for arrow, arrow-array, and arrow-data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]