For binary layouts it’s recommended to use Value(i) since it will properly handle offset arrays, the binary layouts offer a ValueBytes method https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/array#Binary.ValueBytes which is the entire slice of values and you can iterate the slice returned from Offsets() to get each individual value. Each value i would be valuebytes[ offset[i] : offset[i+1] ] for both the string and binary arrays.
As far as cache misses for checking IsNull, it depends on what you’re doing, remember that it’s consulting a bitmap rather than a flat boolean so it wouldn’t be a cache miss on every iteration. If you’re performing computations or operations on the values, then you don’t need to check IsNull as you can split the bitmapbytes and the values into two separate cases for the operation and then just combine the result bitmap with the result value array. If you are having issues achieving the level of performance you want, we can take a look and see where the bottleneck is. --Matt From: James Van Alstine <[email protected]> Sent: Friday, November 5, 2021 9:31 PM To: [email protected] Subject: Re: [Go] Efficiently loop through values Got it. Would the same go for non-primitive layouts like binary layout? Wouldn’t checking IsNull on each iteration cause a cache miss on each iteration? On Fri, Nov 5, 2021 at 6:23 PM Matthew Topol <[email protected]<mailto:[email protected]>> wrote: Got it. Would the same go for non-primitive layouts like binary layout? Wouldn’t checking IsNull on each iteration cause a cache miss on each iteration? On Fri, Nov 5, 2021 at 6:23 PM Matthew Topol <[email protected]<mailto:[email protected]>> wrote: Hey James, all of the primitive Array types that store their data as a contiguous array have a function which can return that array. For example, if you have an *array.Date32 you can use the Date32Values() method as shown here: https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/array#Date32.Date32Values<https://urldefense.com/v3/__https:/pkg.go.dev/github.com/apache/arrow/go/[email protected]/array*Date32.Date32Values__;Iw!!PBKjc0U4!dET_Uur4pqYvxkeWWSvIi1DKbO68I9L1NxmrCtf2TbCLWwAWtBNyYk5wLhO4G1I$> The same would be true for all of the other primitive types such as the int and uint types. You would still have to consult the validity bitmap in order to tell whether a particular index is null either via the IsNull method. In most cases the overhead from calling Value(i) instead of just iterating over a slice is negligible. From: James Van Alstine <[email protected]<mailto:[email protected]>> Sent: Friday, November 5, 2021 8:31 PM To: [email protected]<mailto:[email protected]> Subject: [Go] Efficiently loop through values What is the most efficient way to loop through the values in an array? It seems like it would be most efficient if I could get a contiguous array of values to loop through, but as far as I know the array interface only exposes the ith value via Value(i). Is there a different way to loop over the values?
