What's the right way to convert Arrow arrays to numpy arrays in the presence of nulls?
The first thing I reach for is array.to_numpy(zero_safe_copy=False). But this has some behaviors that I found a little undesirable. For numeric data (or at least int64 and float64), nulls are converted to floating point NaNs and the resulting numpy array is recast from integer to floating point. For example: >>> pa.array([1, 2, 3, None, 5]) <pyarrow.lib.Int64Array object at 0x111b970a0> [ 1, 2, 3, null, 5 ] >>> a.to_numpy(False) array([ 1., 2., 3., nan, 5.]) This can be problematic: *actual* floating point NaNs are mixed with nulls, which is lossy: >>> pa.array([1., 2., float("nan"), None]).to_numpy(False) array([ 1., 2., nan, nan]) Boolean arrays get converted into 'object'-dtyped numpy arrays, with 'True', 'False', and 'None', which is a little undesirable as well. One tool in numpy for dealing with nullable data is masked arrays ( https://numpy.org/doc/stable/reference/maskedarray.html) which work somewhat like Arrow arrays' validity bitmap. I was thinking of writing some code that generates a numpy masked array from an arrow array, but I'd need to get the validity bitmap itself, and it doesn't seem to be accessible in any pyarrow APIs. Am I missing it? Or, am I thinking about this wrong, and there's some other way to pull nullable data out of arrow and into numpy? Thanks, Spencer