For binary layouts it’s recommended to use Value(i) since it will properly 
handle offset arrays, the binary layouts offer a ValueBytes method 
https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/array#Binary.ValueBytes
 which is the entire slice of values and you can iterate the slice returned 
from Offsets() to get each individual value. Each value i would be valuebytes[ 
offset[i] : offset[i+1] ] for both the string and binary arrays.

As far as cache misses for checking IsNull, it depends on what you’re doing, 
remember that it’s consulting a bitmap rather than a flat boolean so it 
wouldn’t be a cache miss on every iteration. If you’re performing computations 
or operations on the values, then you don’t need to check IsNull as you can 
split the bitmapbytes and the values into two separate cases for the operation 
and then just combine the result bitmap with the result value array.

If you are having issues achieving the level of performance you want, we can 
take a look and see where the bottleneck is.

--Matt

From: James Van Alstine <[email protected]>
Sent: Friday, November 5, 2021 9:31 PM
To: [email protected]
Subject: Re: [Go] Efficiently loop through values

Got it. Would the same go for non-primitive layouts like binary layout? 
Wouldn’t checking IsNull on each iteration cause a cache miss on each 
iteration? On Fri, Nov 5, 2021 at 6:23 PM Matthew Topol 
<[email protected]<mailto:[email protected]>> wrote: ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ 
Got it. Would the same go for non-primitive layouts like binary layout?

Wouldn’t checking IsNull on each iteration cause a cache miss on each iteration?

On Fri, Nov 5, 2021 at 6:23 PM Matthew Topol 
<[email protected]<mailto:[email protected]>> wrote:
Hey James, all of the primitive Array types that store their data as a 
contiguous array have a function which can return that array. For example, if 
you have an *array.Date32 you can use the Date32Values() method as shown here: 
https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/array#Date32.Date32Values<https://urldefense.com/v3/__https:/pkg.go.dev/github.com/apache/arrow/go/[email protected]/array*Date32.Date32Values__;Iw!!PBKjc0U4!dET_Uur4pqYvxkeWWSvIi1DKbO68I9L1NxmrCtf2TbCLWwAWtBNyYk5wLhO4G1I$>
 The same would be true for all of the other primitive types such as the int 
and uint types.

You would still have to consult the validity bitmap in order to tell whether a 
particular index is null either via the IsNull method. In most cases the 
overhead from calling Value(i) instead of just iterating over a slice is 
negligible.


From: James Van Alstine <[email protected]<mailto:[email protected]>>
Sent: Friday, November 5, 2021 8:31 PM
To: [email protected]<mailto:[email protected]>
Subject: [Go] Efficiently loop through values

What is the most efficient way to loop through the values in an array? It seems 
like it would be most efficient if I could get a contiguous array of values to 
loop through, but as far as I know the array interface only exposes the ith 
value via Value(i). Is there a different way to loop over the values?

Reply via email to