Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
conbench-apache-arrow[bot] commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2960060497 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 30cc5dd788d275dcec9ea55779803291663a863f. There were 66 benchmark results with an error: - Commit Run on `arm64-t4g-2xlarge-linux` at [2025-06-10 11:09:58Z](https://conbench.ursa.dev/compare/runs/d005e2804a304137b01835798f0c7629...9fba708752704f26bb6fe00502db6de2/) - [`tpch` (R) with engine=arrow, format=parquet, language=R, memory_map=False, query_id=TPCH-07, scale_factor=1](https://conbench.ursa.dev/benchmark-results/06848253058074cd8000520972696c88) - [`tpch` (R) with engine=arrow, format=native, language=R, memory_map=False, query_id=TPCH-09, scale_factor=1](https://conbench.ursa.dev/benchmark-results/0684825e5100701180001e6a54679a53) - and 64 more (see the report linked below) There were no benchmark performance regressions. 🎉 The [full Conbench report](https://github.com/apache/arrow/runs/43830375765) has more details. It also includes information about 8 possible false positives for unstable benchmarks that are known to sometimes produce them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou merged PR #45818: URL: https://github.com/apache/arrow/pull/45818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r213501 ## python/pyarrow/tests/test_scalars.py: ## @@ -238,13 +249,16 @@ def test_numerics(): assert repr(s) == "" assert str(s) == "1.5" assert s.as_py() == 1.5 +assert float(s) == 1.5 +assert int(s) == 1 # float16 s = pa.scalar(0.5, type='float16') assert isinstance(s, pa.HalfFloatScalar) assert repr(s) == "" assert str(s) == "0.5" assert s.as_py() == 0.5 +assert int(s) == 0 Review Comment: How about also testing `float(s)`? I suppose that should work? ```suggestion assert float(s) == 1.5 assert int(s) == 0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2135728487 ## python/pyarrow/tests/test_scalars.py: ## @@ -238,13 +249,16 @@ def test_numerics(): assert repr(s) == "" assert str(s) == "1.5" assert s.as_py() == 1.5 +assert float(s) == 1.5 +assert int(s) == 1 # float16 -s = pa.scalar(0.5, type='float16') +s = pa.scalar(np.float16(0.5), type='float16') Review Comment: I did a dodgy rebase, but I think I should have fixed this now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2135691901 ## python/pyarrow/tests/test_scalars.py: ## @@ -238,13 +249,16 @@ def test_numerics(): assert repr(s) == "" assert str(s) == "1.5" assert s.as_py() == 1.5 +assert float(s) == 1.5 +assert int(s) == 1 # float16 -s = pa.scalar(0.5, type='float16') +s = pa.scalar(np.float16(0.5), type='float16') Review Comment: Actually, NumPy is not needed for this anymore (thanks to the latest changes on git main): ```suggestion s = pa.scalar(0.5, type='float16') ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2135399546 ## python/pyarrow/scalar.pxi: ## @@ -219,6 +220,8 @@ cdef class BooleanScalar(Scalar): cdef CBooleanScalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __bool__(self): +return self.as_py() Review Comment: Thanks, updated now! ## python/pyarrow/tests/test_scalars.py: ## @@ -219,6 +220,9 @@ def test_bool(): assert true.as_py() is True assert false.as_py() is False +assert bool(true) is True +assert bool(false) is False Review Comment: Great idea, also added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2106939044 ## python/pyarrow/scalar.pxi: ## @@ -219,6 +220,8 @@ cdef class BooleanScalar(Scalar): cdef CBooleanScalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __bool__(self): +return self.as_py() Review Comment: I missed this, but we should be careful if the scalar is not valid, as `None` isn't a valid return value for `__bool__`: ```python >>> class B: ...: def __bool__(self): return None ...: >>> bool(B()) ... TypeError: __bool__ should return bool, returned NoneType ``` We could probably circumvent this by writing: ```python def __bool__(self): return self.as_py() or False ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2106945684 ## python/pyarrow/scalar.pxi: ## @@ -219,6 +220,8 @@ cdef class BooleanScalar(Scalar): cdef CBooleanScalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __bool__(self): +return self.as_py() Review Comment: This is important because people expect most Python values to be usable in a boolean context, so `__bool__` should not fail. For example: ```python >>> bool(None) False ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2106943105 ## python/pyarrow/tests/test_scalars.py: ## @@ -219,6 +220,9 @@ def test_bool(): assert true.as_py() is True assert false.as_py() is False +assert bool(true) is True +assert bool(false) is False Review Comment: Perhaps also test the case where the boolean isn't valid? Something like: ```suggestion assert bool(false) is False null = pa.scalar(None, type=pa.bool_()) assert isinstance(null, pa.BooleanScalar) # other tests to be added ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2908937737 @pitrou Thanks for your feedback; any more changes needed here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102992832 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: Ah, fantastic, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102421005 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: Hmm, you may want to report a bug for this. Here, however, you don't need to use Arrow casting. You can instead just write something like: ```python def __int__(self): return int(float(self)) ``` and it will return an arbitrary precision Python int. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102181686 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: I'm a bit unsure how to approach this for floats. I was going to put a cast in the `__int__` method and cast halffloats to in32 and floats to int64 (to account for the higher possible max value). But I'm not sure how to do this for doubles. I considered just casting to int64 and erroring on overflow, but if I set options to allow truncation, it then disables the option to error on integer overflow. Is there another way, or are we best perhaps not defining the `__int__` method for floats, and letting people do the casts manually themselves if they want to do this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102188501 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: e.g. ``` >>> pc.cast(pa.scalar(1e308), options = pc.CastOptions(target_type = pa.int32(), allow_float_truncate = True, allow_int_overflow = False)) ``` Is this a bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102181686 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: I'm a bit unsure how to approach this for floats. I was going to put a cast in the `__int__` method and cast halffloats to int32 and floats to int64 (to account for the higher possible max value). But I'm not sure how to do this for doubles. I considered just casting to int64 and erroring on overflow, but if I set options to allow truncation, it then disables the option to error on integer overflow. Is there another way, or are we best perhaps not defining the `__int__` method for floats, and letting people do the casts manually themselves if they want to do this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102103376 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: OK, will add that in too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102086624 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: (note you should add `__int__` to floats but not `__index__`, because the conversion is not exact in that case; this is because we don't want `[1, 2, 3][0.5]` to work, for example) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102080439 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: It looks like we get int for free by adding index if I'm reading it right, so I think I'll just need to rename int to index? I'll push a change soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102086624 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: (note you should add `__int__` to floats but not `__index__`, because the conversion is not exact in that case) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102080439 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: It looks like we get int for free by adding index if I'm reading it right, so I think I'll just need to rename int to index and then add it (plus tests) to the floats too? I'll push a change soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102076201 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: I think that `__index__` is sufficient in all cases, since it will be used if `__int__` doesn't exist (the reverse is not true). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2102054780 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: We can add both, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2101915118 ## python/pyarrow/scalar.pxi: ## @@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): +return self.as_py() Review Comment: Sorry for noticing this only now, but `__index__` would actually be better as it supports more situations than `__int__` (`__index__` is for objects that convert _losslessly_ to an integer, which is the case here, while `__int__` can be defined on e.g. floats): ```python >>> class Int: ...: def __int__(self): return 42 ...: >>> class Index: ...: def __index__(self): return 42 ...: >>> int(Int()) 42 >>> int(Index()) 42 >>> hex(Int()) Traceback (most recent call last): Cell In[9], line 1 hex(Int()) TypeError: 'Int' object cannot be interpreted as an integer >>> hex(Index()) '0x2a' >>> import math >>> math.factorial(Int()) Traceback (most recent call last): Cell In[4], line 1 math.factorial(Int()) TypeError: 'Int' object cannot be interpreted as an integer >>> math.factorial(Index()) 14050061177528798985431426062445115699363840 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
raulcd commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2898253306 The CI failures are unrelated, the tracking issue is: - https://github.com/apache/arrow/issues/46516 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2898252117 Stand corrected - the failures are seen elsewhere too, for example: https://github.com/apache/arrow/actions/runs/15155834161. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2898242028 Hm, haven't seen this failures elsewhere. Will look into it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2897307341 Implemented the latest round of suggestions. The CI failures look unrelated to this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2099471880 ## python/pyarrow/tests/test_scalars.py: ## @@ -558,14 +564,29 @@ def test_binary(value, ty, scalar_typ): assert s != b'x' buf = s.as_buffer() -assert isinstance(buf, pa.Buffer) -assert buf.to_pybytes() == value + +if value is None: +with pytest.raises(ValueError): Review Comment: Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2092590285 ## python/pyarrow/scalar.pxi: ## @@ -1118,6 +1170,16 @@ cdef class MapScalar(ListScalar): result_dict[key] = value return result_dict +def keys(self): Review Comment: Oh yes, of course. Maybe we will get back to this after https://github.com/apache/arrow/pull/45818#discussion_r2090475778 is resolved. Thanks for trying! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2087295883 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1129,24 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +dct = arr[_normalize_index(i, len(arr))] +return (dct[key_field], dct[item_field]) +else: +for dct in arr: +if dct[key_field].as_py() == i: +return dct[item_field] +raise KeyError(i) Review Comment: Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2075887882 ## python/pyarrow/scalar.pxi: ## @@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar): buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() +def __bytes__(self): +return (self.as_py()) + +def __getbuffer__(self, cp.Py_buffer* buffer, int flags): +cdef Buffer buf = self.as_buffer() + +if buf.buffer.get().is_mutable(): Review Comment: Cheers, done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2075886115 ## python/pyarrow/tests/test_scalars.py: ## @@ -577,6 +607,13 @@ def test_binary(value, ty, scalar_typ): assert memview.strides == (1,) +def test_binary_null(): +s = pa.scalar(None, type=pa.binary()) +assert s.as_py() is None +with pytest.raises(TypeError): +memoryview(s) Review Comment: Added in! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2854706953 Thanks for the reviews! It sounds like for now, the simplest way forward is for me to implement the suggestions around the tests and the change to the `__getbuffer__()` method, and then wait until we've got clarity about how we want to handle MapScalars before changing how we approach the mapping/sequence stuff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2850326008 > Note this is only useful if those classes implement the full Mapping/Sequence API. If you want to benefit from the mixin methods to avoid re-implementing them, you should inherit from the `Mapping` or `Sequence` class instead of using `register`. Agree, inheriting would be much simpler - I just wasn't sure we want to inherit from multiple classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2068919703 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +if arr is None: +raise IndexError(i) +dct = arr[_normalize_index(i, len(arr))] +return (dct[key_field], dct[item_field]) +else: +for dct in arr: +if dct[key_field].as_py() == i: +return dct[item_field] Review Comment: Given that looping over all keys is not efficient, I wonder if it's really a good idea to provide this after all. @jorisvandenbossche @kszucs Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2068909794 ## python/pyarrow/scalar.pxi: ## @@ -221,6 +221,8 @@ cdef class BooleanScalar(Scalar): cdef CBooleanScalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __bool__(self): +return (self.as_py()) Review Comment: Style nit: parentheses are generally not needed after `return` ```suggestion def __bool__(self): return self.as_py() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2068900808 ## python/pyarrow/scalar.pxi: ## @@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar): buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() +def __bytes__(self): +return (self.as_py()) + +def __getbuffer__(self, cp.Py_buffer* buffer, int flags): +cdef Buffer buf = self.as_buffer() + +if buf.buffer.get().is_mutable(): Review Comment: I don't think you have to do all this by hand. `self.as_buffer()` already gives you a [Python buffer-compliant](https://docs.python.org/3/c-api/buffer.html) object, so you can build on that. Something like this: ```cython def __getbuffer__(self, cp.Py_buffer* view, int flags): buf = self.as_buffer() if buf is None: raise ValueError("Cannot export buffer from null Arrow Scalar") return cp.PyObject_GetBuffer(buf, view, flags) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2842310641 > That would involve registering `pa.MapScalar` and `pa.ListScalar` as virtual subclasses with `Mapping.register(...)` and `Sequence.register(...)`, respectively, and possibly implementing a few missing magic methods. Note this is only useful if those classes implement the full Mapping/Sequence API. If you want to benefit from the mixin methods to avoid re-implementing them, you should inherit from the `Mapping` or `Sequence` class instead of using `register`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2838316512 @thisisnic sorry to complicate things a bit — especially since I proposed those test helpers in the first place — but after looking at them again, I’m leaning toward doing the necessary work to support `isinstance(obj, Mapping)` and `isinstance(obj, Sequence)` checks using `collections.abc`. That would involve registering `pa.MapScalar` and `pa.ListScalar` as virtual subclasses with `Mapping.register(...)` and `Sequence.register(...)`, respectively, and possibly implementing a few missing magic methods. Here’s the relevant Python docs I was reading: https://docs.python.org/3/library/collections.abc.html A couple of notes from that page: > Existing classes and built-in classes can be registered as “virtual subclasses” of the ABCs. Those classes should define the full API including all of the abstract methods and all of the mixin methods. This lets users rely on [issubclass()](https://docs.python.org/3/library/functions.html#issubclass) or [isinstance()](https://docs.python.org/3/library/functions.html#isinstance) tests to determine whether the full interface is supported. > knowing that a class supplies __getitem__, __len__, and __iter__ is insufficient for distinguishing a [Sequence](https://docs.python.org/3/library/collections.abc.html#collections.abc.Sequence) from a [Mapping](https://docs.python.org/3/library/collections.abc.html#collections.abc.Mapping). Let me know what you think! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2065785844 ## python/pyarrow/scalar.pxi: ## @@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar): buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() +def __bytes__(self): +return (self.as_py()) + +def __getbuffer__(self, cp.Py_buffer* buffer, int flags): +cdef Buffer buf = self.as_buffer() + +if buf.buffer.get().is_mutable(): +buffer.readonly = 0 +else: +if flags & cp.PyBUF_WRITABLE: +raise BufferError("Writable buffer requested but Arrow " + "buffer was not mutable") +buffer.readonly = 1 +buffer.buf = buf.buffer.get().data() +buffer.len = buf.size +if buffer.buf == NULL: +assert buffer.len == 0 +buffer.buf = cp.PyBytes_AS_STRING(b"") Review Comment: I was looking into whether it's possible to construct a scalar with a `NULL` address, and I couldn’t find a way to do it—so either it’s not possible or it’s just very difficult :) Given that, I think your existing test for `None` scalar is appropriate. It would also be good to add a test for an empty string (`b''` and/or `''`). With those in place, we should be able to remove this part of the code: ```suggestion ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2063595348 ## python/pyarrow/tests/test_scalars.py: ## @@ -577,6 +607,13 @@ def test_binary(value, ty, scalar_typ): assert memview.strides == (1,) +def test_binary_null(): +s = pa.scalar(None, type=pa.binary()) +assert s.as_py() is None +with pytest.raises(TypeError): +memoryview(s) Review Comment: This part can be moved in the `test_binary` with type being parametrised with `ty` parameter. What about an empty string scalar `pa.scalar("", type=pa.binary())`? Can we add this to the test also? I think in this case it should hold: ```python assert buffer.len == 0 assert memview == "" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2063595348 ## python/pyarrow/tests/test_scalars.py: ## @@ -577,6 +607,13 @@ def test_binary(value, ty, scalar_typ): assert memview.strides == (1,) +def test_binary_null(): +s = pa.scalar(None, type=pa.binary()) +assert s.as_py() is None +with pytest.raises(TypeError): +memoryview(s) Review Comment: This part can be moved in the `test_binary` with type being parametrised with `ty` parameter. What about an empty string scalar `pa.scalar("", type=pa.binary())`? Can we add this to the test also? I think this should hold: ```python assert buffer.len == 0 assert memview == "" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2063573529 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1129,24 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +dct = arr[_normalize_index(i, len(arr))] +return (dct[key_field], dct[item_field]) +else: +for dct in arr: +if dct[key_field].as_py() == i: +return dct[item_field] +raise KeyError(i) Review Comment: This is what I would suggest: ```suggestion if isinstance(i, (str, bytes)): i = self.keys().index(i) dct = arr[_normalize_index(i, len(arr))] return (dct[key_field], dct[item_field]) ``` There will be two changes in behaviour though. The key lookup will return a key and value tuple and not only a value which is inline with the index lookup and I think it is valid. The error in case the key is not present (like "fake_key") will be raised by `index(i)` but with `ValueError`. That can be changed though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2063547272 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +if arr is None: +raise IndexError(i) +dct = arr[_normalize_index(i, len(arr))] +return (dct[key_field], dct[item_field]) +else: +for dct in arr: +if dct[key_field].as_py() == i: +return dct[item_field] +raise KeyError(i) Review Comment: Never mind, got it! :) I would then suggest a bit of a code shift - will create a separate suggestion comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2063420030 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +if arr is None: +raise IndexError(i) +dct = arr[_normalize_index(i, len(arr))] +return (dct[key_field], dct[item_field]) +else: +for dct in arr: +if dct[key_field].as_py() == i: +return dct[item_field] +raise KeyError(i) Review Comment: Can you share the failure here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2063335310 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +if arr is None: +raise IndexError(i) +dct = arr[_normalize_index(i, len(arr))] +return (dct[key_field], dct[item_field]) +else: +for dct in arr: +if dct[key_field].as_py() == i: +return dct[item_field] +raise KeyError(i) Review Comment: I think we need to keep this one in case there are no items matching the specified key? Otherwise we get a new test failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2063141577 ## python/pyarrow/scalar.pxi: ## @@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar): buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() +def __bytes__(self): +return (self.as_py()) + +def __getbuffer__(self, cp.Py_buffer* buffer, int flags): +cdef Buffer buf = self.as_buffer() + +if buf.buffer.get().is_mutable(): +buffer.readonly = 0 +else: +if flags & cp.PyBUF_WRITABLE: +raise BufferError("Writable buffer requested but Arrow " + "buffer was not mutable") +buffer.readonly = 1 +buffer.buf = buf.buffer.get().data() +buffer.len = buf.size +if buffer.buf == NULL: +assert buffer.len == 0 +buffer.buf = cp.PyBytes_AS_STRING(b"") Review Comment: (This is testing for a NULL one so buffer length should be zero and it'll be an empty string) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2062918249 ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +if arr is None: +raise IndexError(i) +dct = arr[_normalize_index(i, len(arr))] +return (dct[key_field], dct[item_field]) +else: +for dct in arr: +if dct[key_field].as_py() == i: +return dct[item_field] +raise KeyError(i) Review Comment: This can also be removed, same as above. ## python/pyarrow/scalar.pxi: ## @@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar): def __getitem__(self, i): """ -Return the value at the given index. +Return the value at the given index or key. """ + arr = self.values if arr is None: -raise IndexError(i) -dct = arr[_normalize_index(i, len(arr))] -return (dct[self.type.key_field.name], dct[self.type.item_field.name]) +raise IndexError(i) if isinstance(i, int) else KeyError(i) + +key_field = self.type.key_field.name +item_field = self.type.item_field.name + +if isinstance(i, int): +if arr is None: +raise IndexError(i) Review Comment: I think this is not needed here anymore as it is already checked on line 1070/1133. ## python/pyarrow/scalar.pxi: ## @@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar): buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() +def __bytes__(self): +return (self.as_py()) + +def __getbuffer__(self, cp.Py_buffer* buffer, int flags): +cdef Buffer buf = self.as_buffer() + +if buf.buffer.get().is_mutable(): +buffer.readonly = 0 +else: +if flags & cp.PyBUF_WRITABLE: +raise BufferError("Writable buffer requested but Arrow " + "buffer was not mutable") +buffer.readonly = 1 +buffer.buf = buf.buffer.get().data() +buffer.len = buf.size +if buffer.buf == NULL: +assert buffer.len == 0 +buffer.buf = cp.PyBytes_AS_STRING(b"") Review Comment: Could we test this part also, maybe with something like `pa.scalar(None, type=ty)`? ## docs/source/python/compute.rst: ## @@ -63,6 +63,21 @@ Below are a few simple examples:: >>> pc.multiply(x, y) +If you are using a compute function which returns more than one value, results +will be returned as a StructScalar. You can extract the individual values by Review Comment: ```suggestion will be returned as a ``StructScalar``. You can extract the individual values by ``` ## docs/source/python/compute.rst: ## @@ -63,6 +63,21 @@ Below are a few simple examples:: >>> pc.multiply(x, y) +If you are using a compute function which returns more than one value, results Review Comment: ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
pitrou commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2823498269 > I tried updating it to return values instead of keys, but this introduces a load of breaking changes in the tests as we're now fundamentally changing how StructScalars work. Ah, my bad. Thanks for trying it out :) Then we should certainly leave it as-is. We can document the use of the `values` method then, perhaps: ```python >>> a, b = pc.min_max([1, 2,3 ]).values() >>> a >>> b ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2823427616 @pitrou - I've implemented most of the changes suggested in your original issue, but the one I'm stuck on is the example you gave: ``` >>> a, b = pc.min_max([1,2,3]) >>> a # expecting 1 'min' >>> b # expecting 3 'max' ``` This happens because the compute function returns a StructScalar and in the `__iter__()` method for StructScalars, we return keys and not values. This seems analagous to how Python dictionaries work, e.g. ``` >>> a, b = {'min': 1, 'max': 3} >>> a 'min' >>> b 'max' ``` I tried updating it to return values instead of keys, but this introduces a load of breaking changes in the tests as we're now fundamentally changing how StructScalars work. ``` 1. FAILED pyarrow/tests/test_scalars.py::test_basics[builtin_pickle-value35-None-StructScalar] - TypeError: Expected integer or string index 2. FAILED pyarrow/tests/test_scalars.py::test_map[builtin_pickle] - ValueError: Converting to Python dictionary is not supported when duplicate field names are present 3. FAILED pyarrow/tests/test_scalars.py::test_struct - AssertionError: assert [2, 3.5] == ['x', 'y'] 4. FAILED pyarrow/tests/test_scalars.py::test_struct_duplicate_fields - AssertionError: assert [1, 2.0, 3] == ['x', 'y', 'x'] 5. FAILED pyarrow/tests/test_scalars.py::test_nested_map_types_with_maps_as_pydicts - TypeError: Expected integer or string index ``` It feels like this could have significant user impact. Is it desirable to make such an in-depth change to how StructScalars work, or do you reckon we should just document that users should do something like use `items()` to get these values, like this? ``` >>> a, b = pc.min_max([1,2,3]).items() >>> a ('min', ) >>> b ('max', ) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2815955619 Hm, interesting. I think you're right in how you're interpreting the current behavior and what would be needed. Though looking at [the original PR that implemented this](https://github.com/apache/arrow/pull/7519), it seems like the current `__iter__` behavior was intentional. That said, I’d expect `list(s)` to behave a bit like `s.to_py()`, but maybe returning a list of tuples rather than a dict. Meaning a breaking change would be needed. Might be worth discussing the expected semantics here—seems like a good topic to bring up with Raul next week. I'll make sure we get that scheduled! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2815412835 I'm a bit stuck as to how we can have StructScalar objects do all of: * implement Mapping * implement Sequences * implement unpacking From what I've read, to implement unpacking, we need the `__iter__()` method for `StructScalar` to yield values not keys, but then if we do this, we break some of our existing tests like `list(s) == ['x', 'y']`. I just want to check that this breaking change is desirable? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2814747110 The PR is coming along nicely—great progress! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2049405787 ## python/pyarrow/tests/test_scalars.py: ## @@ -555,17 +560,23 @@ def test_binary(value, ty, scalar_typ): assert str(s) == str(value) assert repr(value) in repr(s) assert s.as_py() == value +assert bytes(s) == value assert s != b'x' buf = s.as_buffer() assert isinstance(buf, pa.Buffer) assert buf.to_pybytes() == value +memview = memoryview(s) +assert isinstance(memview, memoryview) Review Comment: Updated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2813650208 OK, still got unpacking to implement, but this is getting there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2044537069 ## python/pyarrow/scalar.pxi: ## @@ -887,7 +887,29 @@ cdef class BinaryScalar(Scalar): def __getbuffer__(self, cp.Py_buffer* buffer, int flags): cdef Buffer buf = self.as_buffer() -PyObject_GetBuffer(buf, buffer, flags) + +if buf.buffer.get().is_mutable(): +buffer.readonly = 0 +else: +if flags & cp.PyBUF_WRITABLE: +raise BufferError("Writable buffer requested but Arrow " + "buffer was not mutable") +buffer.readonly = 1 +buffer.buf = buf.buffer.get().data() +buffer.len = buf.size +if buffer.buf == NULL: +# ARROW-16048: Ensure we don't export a NULL address. Review Comment: ```suggestion ``` We can remove this comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2044547631 ## python/pyarrow/tests/test_scalars.py: ## @@ -555,17 +560,23 @@ def test_binary(value, ty, scalar_typ): assert str(s) == str(value) assert repr(value) in repr(s) assert s.as_py() == value +assert bytes(s) == value assert s != b'x' buf = s.as_buffer() assert isinstance(buf, pa.Buffer) assert buf.to_pybytes() == value +memview = memoryview(s) +assert isinstance(memview, memoryview) Review Comment: ```suggestion ``` This line can be removed and some additional assertions can be added, like here: https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/tests/test_tensor.py#L217-L226 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2805017289 I've marked this as draft as I've still a few things left to do before this is finished (not all of the below are fully tested + I want to follow up on your comment on Python unpacking above) Nic TODO: * list-like scalars could implement [Sequence](https://docs.python.org/3.13/library/collections.abc.html#collections.abc.Sequence) * map scalars could implement [Mapping](https://docs.python.org/3.13/library/collections.abc.html#collections.abc.Mapping) * struct scalars could implement both Sequence and Mapping (?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2034608323 ## python/pyarrow/scalar.pxi: ## @@ -847,6 +882,12 @@ cdef class BinaryScalar(Scalar): buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() +def __bytes__(self): +return (self.as_py()) + +def __buffer__(self): +return(memoryview(self.as_buffer())) Review Comment: This here will be a bit more tricky. We need to go to a lower level and pass the pointer to the actual buffer in memory, as well as some metadata. Also the method will need to be `__getbuffer__ `. I think we should be able to mostly reuse what is defined here: https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/io.pxi#L1561-L1584 with the buffer being `self.as_buffer()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199333 ## python/pyarrow/scalar.pxi: ## @@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar): """ buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() + Review Comment: I think the linter is failing due to empty spaces here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2768606465 Hey @AlenkaF , I think this is ready for another look! I've implemented the simple ones, and some of the more complex ones already exist. I was unsure about how to approach "binary scalars implement perhaps the buffer protocol". I was also unsure how to approach the code examples with min_max: ``` >>> a, b = pc.min_max([1,2,3]) >>> a # expecting 1 'min' >>> b # expecting 3 'max' ``` This happens as it returns a StructScalar, which I'm assuming is analagous to a dictionary, and this behaviour feels like expected behaviour from that perspective? I did try to see if I could implement `values()`, `keys()` and `items()`, but other tests broke, and I couldn't find a way of changing this without introducing a breaking change here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-2769006066 > Hey @AlenkaF , I think this is ready for another look! The changes look good to me 👍 > I've implemented the simple ones, and some of the more complex ones already exist. Which ones do you mean? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on PR #45818: URL: https://github.com/apache/arrow/pull/45818#issuecomment-276939 Thanks, will have a look shortly! > I was unsure about how to approach "binary scalars implement perhaps the buffer protocol". You might want to have a look at: - the [buffer protocol implementation in PyArrow Buffer class](https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/io.pxi#L1561-L1584), - and the Cython docs: - https://cython.readthedocs.io/en/latest/src/userguide/buffer.html - https://docs.python.org/3/c-api/buffer.html You can start with thinking about what to test. For testing use `memoryview` which exposes the buffer protocol, like in [this existing test](https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/tests/test_tensor.py#L204-L226). > I was also unsure how to approach the code examples with min_max: The code example from the issue description is using **Python unpacking** which works for any iterator and that isn't implemented for a struct. See a good SO thread: https://stackoverflow.com/a/37837754. This will be solved when the struct scalar class will implement Sequence. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2021264078 ## python/pyarrow/scalar.pxi: ## @@ -240,6 +240,10 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): + +return(self.as_py()) + Review Comment: OK, let's leave as-is then, no need for premature optimisation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199574 ## python/pyarrow/scalar.pxi: ## @@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar): """ buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() + Review Comment: ```suggestion return None if buffer is None else buffer.to_pybytes() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199574 ## python/pyarrow/scalar.pxi: ## @@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar): """ buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() + Review Comment: ```suggestion return None if buffer is None else buffer.to_pybytes() ``` This might be better then the previous suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199333 ## python/pyarrow/scalar.pxi: ## @@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar): """ buffer = self.as_buffer() return None if buffer is None else buffer.to_pybytes() + Review Comment: ```suggestion ``` I think the linter is failing due to empty spaces here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
AlenkaF commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r2009726204 ## python/pyarrow/scalar.pxi: ## @@ -240,6 +240,10 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): + +return(self.as_py()) + Review Comment: I would be nice to have a Scalar subclass from which all existing int-based classes would inherit from, like in [BaseExtensionType(DataType)](https://github.com/apache/arrow/blob/40c5300edc6ccc0ea4a9131722628ac6203ce806/python/pyarrow/types.pxi#L1644). _But_ we would need to do some gymnastic to move `as_py` to this newly created, lets say `BaseIntScalar` class. That, I think, would make it more of an effort in the end. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]
thisisnic commented on code in PR #45818: URL: https://github.com/apache/arrow/pull/45818#discussion_r1998747389 ## python/pyarrow/scalar.pxi: ## @@ -240,6 +240,10 @@ cdef class UInt8Scalar(Scalar): cdef CUInt8Scalar* sp = self.wrapped.get() return sp.value if sp.is_valid else None +def __int__(self): + +return(self.as_py()) + Review Comment: Is all this copying and pasting OK? I could define a new scalar integer subclass which defines this once and then have the existing int-based subclasses be subclasses of that, but I wasn't sure if that would be overengineering things? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org