Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-06-10 Thread via GitHub


conbench-apache-arrow[bot] commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2960060497

   After merging your PR, Conbench analyzed the 4 benchmarking runs that have 
been run so far on merge-commit 30cc5dd788d275dcec9ea55779803291663a863f.
   
   There were 66 benchmark results with an error:
   
   - Commit Run on `arm64-t4g-2xlarge-linux` at [2025-06-10 
11:09:58Z](https://conbench.ursa.dev/compare/runs/d005e2804a304137b01835798f0c7629...9fba708752704f26bb6fe00502db6de2/)
 - [`tpch` (R) with engine=arrow, format=parquet, language=R, 
memory_map=False, query_id=TPCH-07, 
scale_factor=1](https://conbench.ursa.dev/benchmark-results/06848253058074cd8000520972696c88)
 - [`tpch` (R) with engine=arrow, format=native, language=R, 
memory_map=False, query_id=TPCH-09, 
scale_factor=1](https://conbench.ursa.dev/benchmark-results/0684825e5100701180001e6a54679a53)
   - and 64 more (see the report linked below)
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/43830375765) 
has more details. It also includes information about 8 possible false positives 
for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-06-10 Thread via GitHub


pitrou merged PR #45818:
URL: https://github.com/apache/arrow/pull/45818


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-06-09 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r213501


##
python/pyarrow/tests/test_scalars.py:
##
@@ -238,13 +249,16 @@ def test_numerics():
 assert repr(s) == ""
 assert str(s) == "1.5"
 assert s.as_py() == 1.5
+assert float(s) == 1.5
+assert int(s) == 1
 
 # float16
 s = pa.scalar(0.5, type='float16')
 assert isinstance(s, pa.HalfFloatScalar)
 assert repr(s) == ""
 assert str(s) == "0.5"
 assert s.as_py() == 0.5
+assert int(s) == 0

Review Comment:
   How about also testing `float(s)`? I suppose that should work?
   ```suggestion
   assert float(s) == 1.5
   assert int(s) == 0
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-06-09 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2135728487


##
python/pyarrow/tests/test_scalars.py:
##
@@ -238,13 +249,16 @@ def test_numerics():
 assert repr(s) == ""
 assert str(s) == "1.5"
 assert s.as_py() == 1.5
+assert float(s) == 1.5
+assert int(s) == 1
 
 # float16
-s = pa.scalar(0.5, type='float16')
+s = pa.scalar(np.float16(0.5), type='float16')

Review Comment:
   I did a dodgy rebase, but I think I should have fixed this now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-06-09 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2135691901


##
python/pyarrow/tests/test_scalars.py:
##
@@ -238,13 +249,16 @@ def test_numerics():
 assert repr(s) == ""
 assert str(s) == "1.5"
 assert s.as_py() == 1.5
+assert float(s) == 1.5
+assert int(s) == 1
 
 # float16
-s = pa.scalar(0.5, type='float16')
+s = pa.scalar(np.float16(0.5), type='float16')

Review Comment:
   Actually, NumPy is not needed for this anymore (thanks to the latest changes 
on git main):
   ```suggestion
   s = pa.scalar(0.5, type='float16')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-06-09 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2135399546


##
python/pyarrow/scalar.pxi:
##
@@ -219,6 +220,8 @@ cdef class BooleanScalar(Scalar):
 cdef CBooleanScalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __bool__(self):
+return self.as_py()

Review Comment:
   Thanks, updated now!



##
python/pyarrow/tests/test_scalars.py:
##
@@ -219,6 +220,9 @@ def test_bool():
 assert true.as_py() is True
 assert false.as_py() is False
 
+assert bool(true) is True
+assert bool(false) is False

Review Comment:
   Great idea, also added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-26 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2106939044


##
python/pyarrow/scalar.pxi:
##
@@ -219,6 +220,8 @@ cdef class BooleanScalar(Scalar):
 cdef CBooleanScalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __bool__(self):
+return self.as_py()

Review Comment:
   I missed this, but we should be careful if the scalar is not valid, as 
`None` isn't a valid return value for `__bool__`:
   ```python
   >>> class B:
   ...: def __bool__(self): return None
   ...: 
   >>> bool(B())
 ...
   TypeError: __bool__ should return bool, returned NoneType
   
   ```
   
   We could probably circumvent this by writing:
   ```python
   def __bool__(self):
   return self.as_py() or False
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-26 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2106945684


##
python/pyarrow/scalar.pxi:
##
@@ -219,6 +220,8 @@ cdef class BooleanScalar(Scalar):
 cdef CBooleanScalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __bool__(self):
+return self.as_py()

Review Comment:
   This is important because people expect most Python values to be usable in a 
boolean context, so `__bool__` should not fail. For example:
   ```python
   >>> bool(None)
   False
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-26 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2106943105


##
python/pyarrow/tests/test_scalars.py:
##
@@ -219,6 +220,9 @@ def test_bool():
 assert true.as_py() is True
 assert false.as_py() is False
 
+assert bool(true) is True
+assert bool(false) is False

Review Comment:
   Perhaps also test the case where the boolean isn't valid?
   Something like:
   ```suggestion
   assert bool(false) is False
   
   null = pa.scalar(None, type=pa.bool_())
   assert isinstance(null, pa.BooleanScalar)
   # other tests to be added
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-26 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2908937737

   @pitrou Thanks for your feedback; any more changes needed here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102992832


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   Ah, fantastic, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102421005


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   Hmm, you may want to report a bug for this.
   
   Here, however, you don't need to use Arrow casting. You can instead just 
write something like:
   ```python
   def __int__(self):
   return int(float(self))
   ```
   and it will return an arbitrary precision Python int.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102181686


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   I'm a bit unsure how to approach this for floats.
   
   I was going to put a cast in the `__int__` method and cast halffloats to 
in32 and floats to int64 (to account for the higher possible max value).  But 
I'm not sure how to do this for doubles.  I considered just casting to int64 
and erroring on overflow, but if I set options to allow truncation, it then 
disables the option to error on integer overflow.
   
   Is there another way, or are we best perhaps not defining the `__int__` 
method for floats, and letting people do the casts manually themselves if they 
want to do this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102188501


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   e.g. 
   
   ```
   >>> pc.cast(pa.scalar(1e308),  options = pc.CastOptions(target_type = 
pa.int32(), allow_float_truncate = True, allow_int_overflow = False))
   
   ```
   
   Is this a bug?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102181686


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   I'm a bit unsure how to approach this for floats.
   
   I was going to put a cast in the `__int__` method and cast halffloats to 
int32 and floats to int64 (to account for the higher possible max value).  But 
I'm not sure how to do this for doubles.  I considered just casting to int64 
and erroring on overflow, but if I set options to allow truncation, it then 
disables the option to error on integer overflow.
   
   Is there another way, or are we best perhaps not defining the `__int__` 
method for floats, and letting people do the casts manually themselves if they 
want to do this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102103376


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   OK, will add that in too!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102086624


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   (note you should add `__int__` to floats but not `__index__`, because the 
conversion is not exact in that case; this is because we don't want `[1, 2, 
3][0.5]` to work, for example)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102080439


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   It looks like we get int for free by adding index if I'm reading it right, 
so I think I'll just need to rename int to index? I'll push a change soon



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102086624


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   (note you should add `__int__` to floats but not `__index__`, because the 
conversion is not exact in that case)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102080439


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   It looks like we get int for free by adding index if I'm reading it right, 
so I think I'll just need to rename int to index and then add it (plus tests) 
to the floats too? I'll push a change soon



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102076201


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   I think that `__index__` is sufficient in all cases, since it will be used 
if `__int__` doesn't exist (the reverse is not true).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2102054780


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   We can add both, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-22 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2101915118


##
python/pyarrow/scalar.pxi:
##
@@ -238,6 +241,9 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+return self.as_py()

Review Comment:
   Sorry for noticing this only now, but `__index__` would actually be better 
as it supports more situations than `__int__` (`__index__` is for objects that 
convert _losslessly_ to an integer, which is the case here, while `__int__` can 
be defined on e.g. floats):
   ```python
   >>> class Int:
   ...: def __int__(self): return 42
   ...: 
   >>> class Index:
   ...: def __index__(self): return 42
   ...: 
   >>> int(Int())
   42
   >>> int(Index())
   42
   >>> hex(Int())
   Traceback (most recent call last):
 Cell In[9], line 1
   hex(Int())
   TypeError: 'Int' object cannot be interpreted as an integer
   >>> hex(Index())
   '0x2a'
   
   >>> import math
   >>> math.factorial(Int())
   Traceback (most recent call last):
 Cell In[4], line 1
   math.factorial(Int())
   TypeError: 'Int' object cannot be interpreted as an integer
   
   >>> math.factorial(Index())
   14050061177528798985431426062445115699363840
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-21 Thread via GitHub


raulcd commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2898253306

   The CI failures are unrelated, the tracking issue is:
   - https://github.com/apache/arrow/issues/46516


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-21 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2898252117

   Stand corrected - the failures are seen elsewhere too, for example: 
https://github.com/apache/arrow/actions/runs/15155834161.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-21 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2898242028

   Hm, haven't seen this failures elsewhere. Will look into it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-21 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2897307341

   Implemented the latest round of suggestions.  The CI failures look unrelated 
to this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-20 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2099471880


##
python/pyarrow/tests/test_scalars.py:
##
@@ -558,14 +564,29 @@ def test_binary(value, ty, scalar_typ):
 assert s != b'x'
 
 buf = s.as_buffer()
-assert isinstance(buf, pa.Buffer)
-assert buf.to_pybytes() == value
+
+if value is None:
+with pytest.raises(ValueError):

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-16 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2092590285


##
python/pyarrow/scalar.pxi:
##
@@ -1118,6 +1170,16 @@ cdef class MapScalar(ListScalar):
 result_dict[key] = value
 return result_dict
 
+def keys(self):

Review Comment:
   Oh yes, of course.
   Maybe we will get back to this after 
https://github.com/apache/arrow/pull/45818#discussion_r2090475778 is resolved. 
Thanks for trying!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-13 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2087295883


##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1129,24 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+dct = arr[_normalize_index(i, len(arr))]
+return (dct[key_field], dct[item_field])
+else:
+for dct in arr:
+if dct[key_field].as_py() == i:
+return dct[item_field]
+raise KeyError(i)

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-06 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2075887882


##
python/pyarrow/scalar.pxi:
##
@@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar):
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
 
+def __bytes__(self):
+return (self.as_py())
+
+def __getbuffer__(self, cp.Py_buffer* buffer, int flags):
+cdef Buffer buf = self.as_buffer()
+
+if buf.buffer.get().is_mutable():

Review Comment:
   Cheers, done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-06 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2075886115


##
python/pyarrow/tests/test_scalars.py:
##
@@ -577,6 +607,13 @@ def test_binary(value, ty, scalar_typ):
 assert memview.strides == (1,)
 
 
+def test_binary_null():
+s = pa.scalar(None, type=pa.binary())
+assert s.as_py() is None
+with pytest.raises(TypeError):
+memoryview(s)

Review Comment:
   Added in!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-06 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2854706953

   Thanks for the reviews!  It sounds like for now, the simplest way forward is 
for me to implement the suggestions around the tests and the change to the 
`__getbuffer__()` method, and then wait until we've got clarity about how we 
want to handle MapScalars before changing how we approach the mapping/sequence 
stuff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-05-05 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2850326008

   > Note this is only useful if those classes implement the full 
Mapping/Sequence API. If you want to benefit from the mixin methods to avoid 
re-implementing them, you should inherit from the `Mapping` or `Sequence` class 
instead of using `register`.
   
   Agree, inheriting would be much simpler - I just wasn't sure we want to 
inherit from multiple classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2068919703


##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+if arr is None:
+raise IndexError(i)
+dct = arr[_normalize_index(i, len(arr))]
+return (dct[key_field], dct[item_field])
+else:
+for dct in arr:
+if dct[key_field].as_py() == i:
+return dct[item_field]

Review Comment:
   Given that looping over all keys is not efficient, I wonder if it's really a 
good idea to provide this after all.
   @jorisvandenbossche @kszucs  Thoughts?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2068909794


##
python/pyarrow/scalar.pxi:
##
@@ -221,6 +221,8 @@ cdef class BooleanScalar(Scalar):
 cdef CBooleanScalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __bool__(self):
+return (self.as_py())

Review Comment:
   Style nit: parentheses are generally not needed after `return`
   ```suggestion
   def __bool__(self):
   return self.as_py()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub


pitrou commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2068900808


##
python/pyarrow/scalar.pxi:
##
@@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar):
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
 
+def __bytes__(self):
+return (self.as_py())
+
+def __getbuffer__(self, cp.Py_buffer* buffer, int flags):
+cdef Buffer buf = self.as_buffer()
+
+if buf.buffer.get().is_mutable():

Review Comment:
   I don't think you have to do all this by hand. `self.as_buffer()` already 
gives you a [Python 
buffer-compliant](https://docs.python.org/3/c-api/buffer.html) object, so you 
can build on that.
   
   Something like this:
   ```cython
   def __getbuffer__(self, cp.Py_buffer* view, int flags):
   buf = self.as_buffer()
   if buf is None:
   raise ValueError("Cannot export buffer from null Arrow Scalar")
   return cp.PyObject_GetBuffer(buf, view, flags)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-30 Thread via GitHub


pitrou commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2842310641

   > That would involve registering `pa.MapScalar` and `pa.ListScalar` as 
virtual subclasses with `Mapping.register(...)` and `Sequence.register(...)`, 
respectively, and possibly implementing a few missing magic methods.
   
   Note this is only useful if those classes implement the full 
Mapping/Sequence API. If you want to benefit from the mixin methods to avoid 
re-implementing them, you should inherit from the `Mapping` or `Sequence` class 
instead of using `register`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-29 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2838316512

   @thisisnic sorry to complicate things a bit — especially since I proposed 
those test helpers in the first place — but after looking at them again, I’m 
leaning toward doing the necessary work to support `isinstance(obj, Mapping)` 
and `isinstance(obj, Sequence)` checks using `collections.abc`.
   
   That would involve registering `pa.MapScalar` and `pa.ListScalar` as virtual 
subclasses with `Mapping.register(...)` and `Sequence.register(...)`, 
respectively, and possibly implementing a few missing magic methods.
   
   Here’s the relevant Python docs I was reading: 
https://docs.python.org/3/library/collections.abc.html
   
   A couple of notes from that page:
   
   > Existing classes and built-in classes can be registered as “virtual 
subclasses” of the ABCs. Those classes should define the full API including all 
of the abstract methods and all of the mixin methods. This lets users rely on 
[issubclass()](https://docs.python.org/3/library/functions.html#issubclass) or 
[isinstance()](https://docs.python.org/3/library/functions.html#isinstance) 
tests to determine whether the full interface is supported.
   
   > knowing that a class supplies __getitem__, __len__, and __iter__ is 
insufficient for distinguishing a 
[Sequence](https://docs.python.org/3/library/collections.abc.html#collections.abc.Sequence)
 from a 
[Mapping](https://docs.python.org/3/library/collections.abc.html#collections.abc.Mapping).
   
   Let me know what you think!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-29 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2065785844


##
python/pyarrow/scalar.pxi:
##
@@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar):
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
 
+def __bytes__(self):
+return (self.as_py())
+
+def __getbuffer__(self, cp.Py_buffer* buffer, int flags):
+cdef Buffer buf = self.as_buffer()
+
+if buf.buffer.get().is_mutable():
+buffer.readonly = 0
+else:
+if flags & cp.PyBUF_WRITABLE:
+raise BufferError("Writable buffer requested but Arrow "
+  "buffer was not mutable")
+buffer.readonly = 1
+buffer.buf = buf.buffer.get().data()
+buffer.len = buf.size
+if buffer.buf == NULL:
+assert buffer.len == 0
+buffer.buf = cp.PyBytes_AS_STRING(b"")

Review Comment:
   I was looking into whether it's possible to construct a scalar with a `NULL` 
address, and I couldn’t find a way to do it—so either it’s not possible or it’s 
just very difficult :)
   
   Given that, I think your existing test for `None` scalar is appropriate. It 
would also be good to add a test for an empty string (`b''` and/or `''`).
   
   With those in place, we should be able to remove this part of the code:
   
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-29 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2063595348


##
python/pyarrow/tests/test_scalars.py:
##
@@ -577,6 +607,13 @@ def test_binary(value, ty, scalar_typ):
 assert memview.strides == (1,)
 
 
+def test_binary_null():
+s = pa.scalar(None, type=pa.binary())
+assert s.as_py() is None
+with pytest.raises(TypeError):
+memoryview(s)

Review Comment:
   This part can be moved in the `test_binary` with type being parametrised 
with `ty` parameter.
   
   What about an empty string scalar `pa.scalar("", type=pa.binary())`? Can we 
add this to the test also?
   I think in this case it should hold:
   
   ```python 
   assert buffer.len == 0
   assert memview == ""
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-28 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2063595348


##
python/pyarrow/tests/test_scalars.py:
##
@@ -577,6 +607,13 @@ def test_binary(value, ty, scalar_typ):
 assert memview.strides == (1,)
 
 
+def test_binary_null():
+s = pa.scalar(None, type=pa.binary())
+assert s.as_py() is None
+with pytest.raises(TypeError):
+memoryview(s)

Review Comment:
   This part can be moved in the `test_binary` with type being parametrised 
with `ty` parameter.
   
   What about an empty string scalar `pa.scalar("", type=pa.binary())`? Can we 
add this to the test also?
   I think this should hold:
   
   ```python 
   assert buffer.len == 0
   assert memview == ""
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-28 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2063573529


##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1129,24 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+dct = arr[_normalize_index(i, len(arr))]
+return (dct[key_field], dct[item_field])
+else:
+for dct in arr:
+if dct[key_field].as_py() == i:
+return dct[item_field]
+raise KeyError(i)

Review Comment:
   This is what I would suggest:
   
   ```suggestion
   if isinstance(i, (str, bytes)):
   i = self.keys().index(i)
   
   dct = arr[_normalize_index(i, len(arr))]
   return (dct[key_field], dct[item_field])
   ```
   
   There will be two changes in behaviour though. The key lookup will return a 
key and value tuple and not only a value which is inline with the index lookup 
and I think it is valid. The error in case the key is not present (like 
"fake_key") will be raised by `index(i)` but with `ValueError`. That can be 
changed though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-28 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2063547272


##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+if arr is None:
+raise IndexError(i)
+dct = arr[_normalize_index(i, len(arr))]
+return (dct[key_field], dct[item_field])
+else:
+for dct in arr:
+if dct[key_field].as_py() == i:
+return dct[item_field]
+raise KeyError(i)

Review Comment:
   Never mind, got it! :)
   I would then suggest a bit of a code shift - will create a separate 
suggestion comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-28 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2063420030


##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+if arr is None:
+raise IndexError(i)
+dct = arr[_normalize_index(i, len(arr))]
+return (dct[key_field], dct[item_field])
+else:
+for dct in arr:
+if dct[key_field].as_py() == i:
+return dct[item_field]
+raise KeyError(i)

Review Comment:
   Can you share the failure here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-28 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2063335310


##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+if arr is None:
+raise IndexError(i)
+dct = arr[_normalize_index(i, len(arr))]
+return (dct[key_field], dct[item_field])
+else:
+for dct in arr:
+if dct[key_field].as_py() == i:
+return dct[item_field]
+raise KeyError(i)

Review Comment:
   I think we need to keep this one in case there are no items matching the 
specified key?  Otherwise we get a new test failure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-28 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2063141577


##
python/pyarrow/scalar.pxi:
##
@@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar):
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
 
+def __bytes__(self):
+return (self.as_py())
+
+def __getbuffer__(self, cp.Py_buffer* buffer, int flags):
+cdef Buffer buf = self.as_buffer()
+
+if buf.buffer.get().is_mutable():
+buffer.readonly = 0
+else:
+if flags & cp.PyBUF_WRITABLE:
+raise BufferError("Writable buffer requested but Arrow "
+  "buffer was not mutable")
+buffer.readonly = 1
+buffer.buf = buf.buffer.get().data()
+buffer.len = buf.size
+if buffer.buf == NULL:
+assert buffer.len == 0
+buffer.buf = cp.PyBytes_AS_STRING(b"")

Review Comment:
   (This is testing for a NULL one so buffer length should be zero and it'll be 
an empty string)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-28 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2062918249


##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+if arr is None:
+raise IndexError(i)
+dct = arr[_normalize_index(i, len(arr))]
+return (dct[key_field], dct[item_field])
+else:
+for dct in arr:
+if dct[key_field].as_py() == i:
+return dct[item_field]
+raise KeyError(i)

Review Comment:
   This can also be removed, same as above.



##
python/pyarrow/scalar.pxi:
##
@@ -1064,13 +1126,26 @@ cdef class MapScalar(ListScalar):
 
 def __getitem__(self, i):
 """
-Return the value at the given index.
+Return the value at the given index or key.
 """
+
 arr = self.values
 if arr is None:
-raise IndexError(i)
-dct = arr[_normalize_index(i, len(arr))]
-return (dct[self.type.key_field.name], dct[self.type.item_field.name])
+raise IndexError(i) if isinstance(i, int) else KeyError(i)
+
+key_field = self.type.key_field.name
+item_field = self.type.item_field.name
+
+if isinstance(i, int):
+if arr is None:
+raise IndexError(i)

Review Comment:
   I think this is not needed here anymore as it is already checked on line 
1070/1133.



##
python/pyarrow/scalar.pxi:
##
@@ -847,6 +882,33 @@ cdef class BinaryScalar(Scalar):
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
 
+def __bytes__(self):
+return (self.as_py())
+
+def __getbuffer__(self, cp.Py_buffer* buffer, int flags):
+cdef Buffer buf = self.as_buffer()
+
+if buf.buffer.get().is_mutable():
+buffer.readonly = 0
+else:
+if flags & cp.PyBUF_WRITABLE:
+raise BufferError("Writable buffer requested but Arrow "
+  "buffer was not mutable")
+buffer.readonly = 1
+buffer.buf = buf.buffer.get().data()
+buffer.len = buf.size
+if buffer.buf == NULL:
+assert buffer.len == 0
+buffer.buf = cp.PyBytes_AS_STRING(b"")

Review Comment:
   Could we test this part also, maybe with something like `pa.scalar(None, 
type=ty)`?



##
docs/source/python/compute.rst:
##
@@ -63,6 +63,21 @@ Below are a few simple examples::
>>> pc.multiply(x, y)

 
+If you are using a compute function which returns more than one value, results
+will be returned as a StructScalar.  You can extract the individual values by

Review Comment:
   ```suggestion
   will be returned as a ``StructScalar``.  You can extract the individual 
values by
   ```



##
docs/source/python/compute.rst:
##
@@ -63,6 +63,21 @@ Below are a few simple examples::
>>> pc.multiply(x, y)

 
+If you are using a compute function which returns more than one value, results

Review Comment:
   ❤️ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-23 Thread via GitHub


pitrou commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2823498269

   > I tried updating it to return values instead of keys, but this introduces 
a load of breaking changes in the tests as we're now fundamentally changing how 
StructScalars work.
   
   Ah, my bad. Thanks for trying it out :) Then we should certainly leave it 
as-is.
   
   We can document the use of the `values` method then, perhaps:
   ```python
   >>> a, b = pc.min_max([1, 2,3 ]).values()
   >>> a
   
   >>> b
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-23 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2823427616

   @pitrou - I've implemented most of the changes suggested in your original 
issue, but the one I'm stuck on is the example you gave:
   
   ```
   >>> a, b = pc.min_max([1,2,3])
   >>> a  # expecting 1
   'min'
   >>> b  # expecting 3
   'max'
   ```
   
   This happens because the compute function returns a StructScalar and in the 
`__iter__()` method for StructScalars, we return keys and not values.  This 
seems analagous to how Python dictionaries work, e.g.
   
   ```
   >>> a, b = {'min': 1, 'max': 3}
   >>> a
   'min'
   >>> b
   'max'
   ```
   
   I tried updating it to return values instead of keys, but this introduces a 
load of breaking changes in the tests as we're now fundamentally changing how 
StructScalars work.
   
   ```
   1. FAILED 
pyarrow/tests/test_scalars.py::test_basics[builtin_pickle-value35-None-StructScalar]
 - TypeError: Expected integer or string index
   2. FAILED pyarrow/tests/test_scalars.py::test_map[builtin_pickle] - 
ValueError: Converting to Python dictionary is not supported when duplicate 
field names are present
   3. FAILED pyarrow/tests/test_scalars.py::test_struct - AssertionError: 
assert [2, 3.5] == ['x', 'y']
   4. FAILED pyarrow/tests/test_scalars.py::test_struct_duplicate_fields - 
AssertionError: assert [1, 2.0, 3] == ['x', 'y', 'x']
   5. FAILED 
pyarrow/tests/test_scalars.py::test_nested_map_types_with_maps_as_pydicts - 
TypeError: Expected integer or string index
   ```
   
   It feels like this could have significant user impact.  Is it desirable to 
make such an in-depth change to how StructScalars work, or do you reckon we 
should just document that users should do something like use `items()` to get 
these values, like this?
   
   ```
   >>> a, b = pc.min_max([1,2,3]).items()
   >>> a 
   ('min', )
   >>> b 
   ('max', )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-18 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2815955619

   Hm, interesting. I think you're right in how you're interpreting the current 
behavior and what would be needed. Though looking at [the original PR that 
implemented this](https://github.com/apache/arrow/pull/7519), it seems like the 
current `__iter__` behavior was intentional.
   
   That said, I’d expect `list(s)` to behave a bit like `s.to_py()`, but maybe 
returning a list of tuples rather than a dict. Meaning a breaking change would 
be needed. Might be worth discussing the expected semantics here—seems like a 
good topic to bring up with Raul next week. I'll make sure we get that 
scheduled!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-18 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2815412835

   I'm a bit stuck as to how we can have StructScalar objects do all of:
   * implement Mapping
   * implement Sequences
   * implement unpacking
   
   From what I've read, to implement unpacking, we need the `__iter__()` method 
for `StructScalar` to yield values not keys, but then if we do this, we break 
some of our existing tests like `list(s) == ['x', 'y']`.  I just want to check 
that this breaking change is desirable?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-18 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2814747110

   The PR is coming along nicely—great progress!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-17 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2049405787


##
python/pyarrow/tests/test_scalars.py:
##
@@ -555,17 +560,23 @@ def test_binary(value, ty, scalar_typ):
 assert str(s) == str(value)
 assert repr(value) in repr(s)
 assert s.as_py() == value
+assert bytes(s) == value
 assert s != b'x'
 
 buf = s.as_buffer()
 assert isinstance(buf, pa.Buffer)
 assert buf.to_pybytes() == value
 
+memview = memoryview(s)
+assert isinstance(memview, memoryview)

Review Comment:
   Updated!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-17 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2813650208

   OK, still got unpacking to implement, but this is getting there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-15 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2044537069


##
python/pyarrow/scalar.pxi:
##
@@ -887,7 +887,29 @@ cdef class BinaryScalar(Scalar):
 
 def __getbuffer__(self, cp.Py_buffer* buffer, int flags):
 cdef Buffer buf = self.as_buffer()
-PyObject_GetBuffer(buf, buffer, flags)
+
+if buf.buffer.get().is_mutable():
+buffer.readonly = 0
+else:
+if flags & cp.PyBUF_WRITABLE:
+raise BufferError("Writable buffer requested but Arrow "
+  "buffer was not mutable")
+buffer.readonly = 1
+buffer.buf = buf.buffer.get().data()
+buffer.len = buf.size
+if buffer.buf == NULL:
+# ARROW-16048: Ensure we don't export a NULL address.

Review Comment:
   ```suggestion
   ```
   
   We can remove this comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-15 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2044547631


##
python/pyarrow/tests/test_scalars.py:
##
@@ -555,17 +560,23 @@ def test_binary(value, ty, scalar_typ):
 assert str(s) == str(value)
 assert repr(value) in repr(s)
 assert s.as_py() == value
+assert bytes(s) == value
 assert s != b'x'
 
 buf = s.as_buffer()
 assert isinstance(buf, pa.Buffer)
 assert buf.to_pybytes() == value
 
+memview = memoryview(s)
+assert isinstance(memview, memoryview)

Review Comment:
   ```suggestion
   ```
   
   This line can be removed and some additional assertions can be added, like 
here:
   
   
https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/tests/test_tensor.py#L217-L226



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-15 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2805017289

   I've marked this as draft as I've still a few things left to do before this 
is finished (not all of the below are fully tested + I want to follow up on 
your comment on Python unpacking above)
   
   Nic TODO:
   * list-like scalars could implement 
[Sequence](https://docs.python.org/3.13/library/collections.abc.html#collections.abc.Sequence)
   * map scalars could implement 
[Mapping](https://docs.python.org/3.13/library/collections.abc.html#collections.abc.Mapping)
   * struct scalars could implement both Sequence and Mapping (?)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-08 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2034608323


##
python/pyarrow/scalar.pxi:
##
@@ -847,6 +882,12 @@ cdef class BinaryScalar(Scalar):
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
 
+def __bytes__(self):
+return (self.as_py())
+
+def __buffer__(self):
+return(memoryview(self.as_buffer()))

Review Comment:
   This here will be a bit more tricky. We need to go to a lower level and pass 
the pointer to the actual buffer in memory, as well as some metadata. Also the 
method will need to be `__getbuffer__ `. I think we should be able to mostly 
reuse what is defined here: 
https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/io.pxi#L1561-L1584
 with the buffer being `self.as_buffer()`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-05 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199333


##
python/pyarrow/scalar.pxi:
##
@@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar):
 """
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
+

Review Comment:
   I think the linter is failing due to empty spaces here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-01 Thread via GitHub


thisisnic commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2768606465

   Hey @AlenkaF , I think this is ready for another look!
   
   I've implemented the simple ones, and some of the more complex ones already 
exist.
   
   I was unsure about how to approach "binary scalars implement perhaps the 
buffer protocol".
   
   I was also unsure how to approach the code examples with min_max:
   
   ```
   >>> a, b = pc.min_max([1,2,3])
   >>> a  # expecting 1
   'min'
   >>> b  # expecting 3
   'max'
   ```
   
   This happens as it returns a StructScalar, which I'm assuming is analagous 
to a dictionary, and this behaviour feels like expected behaviour from that 
perspective?  I did try to see if I could implement `values()`, `keys()` and 
`items()`, but other tests broke, and I couldn't find a way of changing this 
without introducing a breaking change here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-01 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-2769006066

   > Hey @AlenkaF , I think this is ready for another look!
   
   The changes look good to me 👍 
   
   > I've implemented the simple ones, and some of the more complex ones 
already exist.
   
   Which ones do you mean?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-04-01 Thread via GitHub


AlenkaF commented on PR #45818:
URL: https://github.com/apache/arrow/pull/45818#issuecomment-276939

   Thanks, will have a look shortly!
   
   > I was unsure about how to approach "binary scalars implement perhaps the 
buffer protocol".
   
   You might want to have a look at:
   
   - the [buffer protocol implementation in PyArrow Buffer 
class](https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/io.pxi#L1561-L1584),
   - and the Cython docs:
 - https://cython.readthedocs.io/en/latest/src/userguide/buffer.html
 - https://docs.python.org/3/c-api/buffer.html
   
   You can start with thinking about what to test. For testing use `memoryview` 
which exposes the buffer protocol, like in [this existing 
test](https://github.com/apache/arrow/blob/60b5ab9ee0bf070f03cf5c92fe0add0257543dfd/python/pyarrow/tests/test_tensor.py#L204-L226).
   
   > I was also unsure how to approach the code examples with min_max:
   
   The code example from the issue description is using **Python unpacking** 
which works for any iterator and that isn't implemented for a struct. See a 
good SO thread: https://stackoverflow.com/a/37837754. This will be solved when 
the struct scalar class will implement Sequence.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-03-31 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2021264078


##
python/pyarrow/scalar.pxi:
##
@@ -240,6 +240,10 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+
+return(self.as_py())
+

Review Comment:
   OK, let's leave as-is then, no need for premature optimisation!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-03-30 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199574


##
python/pyarrow/scalar.pxi:
##
@@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar):
 """
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
+

Review Comment:
   ```suggestion
   return None if buffer is None else buffer.to_pybytes()
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-03-30 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199574


##
python/pyarrow/scalar.pxi:
##
@@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar):
 """
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
+

Review Comment:
   ```suggestion
   return None if buffer is None else buffer.to_pybytes()
   
   ```
   This might be better then the previous suggestion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-03-30 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2020199333


##
python/pyarrow/scalar.pxi:
##
@@ -846,6 +879,9 @@ cdef class BinaryScalar(Scalar):
 """
 buffer = self.as_buffer()
 return None if buffer is None else buffer.to_pybytes()
+

Review Comment:
   ```suggestion
   
   ```
   I think the linter is failing due to empty spaces here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-03-24 Thread via GitHub


AlenkaF commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r2009726204


##
python/pyarrow/scalar.pxi:
##
@@ -240,6 +240,10 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+
+return(self.as_py())
+

Review Comment:
   I would be nice to have a Scalar subclass from which all existing int-based 
classes would inherit from, like in 
[BaseExtensionType(DataType)](https://github.com/apache/arrow/blob/40c5300edc6ccc0ea4a9131722628ac6203ce806/python/pyarrow/types.pxi#L1644).
 _But_ we would need to do some gymnastic to move `as_py` to this newly 
created, lets say `BaseIntScalar` class. That, I think, would make it more of 
an effort in the end.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-45653: [Python] Scalar subclasses should implement Python protocols [arrow]

2025-03-17 Thread via GitHub


thisisnic commented on code in PR #45818:
URL: https://github.com/apache/arrow/pull/45818#discussion_r1998747389


##
python/pyarrow/scalar.pxi:
##
@@ -240,6 +240,10 @@ cdef class UInt8Scalar(Scalar):
 cdef CUInt8Scalar* sp =  self.wrapped.get()
 return sp.value if sp.is_valid else None
 
+def __int__(self):
+
+return(self.as_py())
+

Review Comment:
   Is all this copying and pasting OK?  I could define a new scalar integer 
subclass which defines this once and then have the existing int-based 
subclasses be subclasses of that, but I wasn't sure if that would be 
overengineering things?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org