Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
conbench-apache-arrow[bot] commented on PR #46170: URL: https://github.com/apache/arrow/pull/46170#issuecomment-2814272339 After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit cc56a5e78eeede2715f12b6ca61ebea741f38f60. There were no benchmark performance regressions. 🎉 The [full Conbench report](https://github.com/apache/arrow/runs/40761610933) has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
rok merged PR #46170: URL: https://github.com/apache/arrow/pull/46170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
yyossy5 commented on PR #46170: URL: https://github.com/apache/arrow/pull/46170#issuecomment-2813471967 Thank you @rok ! > A minor ask: could we add a check like: > > ```python > tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(arr) > assert tensor_array_from_numpy.dim_names() is None > ``` You're right, I've added the assertions. https://github.com/apache/arrow/pull/46170/commits/800bb03c0c905487919ac586595aa13b31422df9 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
rok commented on PR #46170: URL: https://github.com/apache/arrow/pull/46170#issuecomment-2813377286 Thanks for working on this @yyossy5! This looks good. A minor ask: could we add a check like: ```python tensor_array_from_numpy = pa.FixedShapeTensorArray.from_numpy_ndarray(arr) assert tensor_array_from_numpy.dim_names is None ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
yyossy5 commented on code in PR #46170:
URL: https://github.com/apache/arrow/pull/46170#discussion_r2049144944
##
python/pyarrow/array.pxi:
##
@@ -4655,6 +4657,17 @@ cdef class FixedShapeTensorArray(ExtensionArray):
"Cannot convert 1D array or scalar to fixed shape tensor
array")
if np.prod(obj.shape) == 0:
raise ValueError("Expected a non-empty ndarray")
+if dim_names is not None:
+if not hasattr(dim_names, '__iter__'):
Review Comment:
Thank you @raulcd !
Fixed it in
https://github.com/apache/arrow/pull/46170/commits/38ae0a4534d5bab2352f7a0d1bb89ea7b379e139
and added a test case for generator input in
https://github.com/apache/arrow/pull/46170/commits/ff2ff30f79abce1a7eef32c3f1945805ae6d6687
.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
raulcd commented on code in PR #46170:
URL: https://github.com/apache/arrow/pull/46170#discussion_r2048956956
##
python/pyarrow/array.pxi:
##
@@ -4655,6 +4657,17 @@ cdef class FixedShapeTensorArray(ExtensionArray):
"Cannot convert 1D array or scalar to fixed shape tensor
array")
if np.prod(obj.shape) == 0:
raise ValueError("Expected a non-empty ndarray")
+if dim_names is not None:
+if not hasattr(dim_names, '__iter__'):
Review Comment:
This will allow us to pass a generator but `len(din_names)` would fail with:
```
>>> g = (x for x in range(2))
>>> hasattr(g, '__iter__')
True
>>> len(g)
Traceback (most recent call last):
File "", line 1, in
TypeError: object of type 'generator' has no len()
```
Should we check for `collections.abc.Sequence` as we do in other places like:
https://github.com/apache/arrow/blob/0ae5c860004293d0b81bbc31198931aeae9bc277/python/pyarrow/feather.py#L259-L261
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
yyossy5 commented on PR #46170: URL: https://github.com/apache/arrow/pull/46170#issuecomment-2812525229 Thank you for reviewing! @AlenkaF > The document you linked will be updated automatically by the changes in this PR, since that section is generated from the docstrings. Oh, that's nice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
AlenkaF commented on PR #46170: URL: https://github.com/apache/arrow/pull/46170#issuecomment-2812296116 > Should the document be revised as well? Thanks for the contribution, @yyossy5! The document you linked will be updated automatically by the changes in this PR, since that section is generated from the docstrings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-45531: [Python] Add the `dim_names` argument to `from_numpy_ndarray` [arrow]
yyossy5 commented on PR #46170: URL: https://github.com/apache/arrow/pull/46170#issuecomment-2810155991 Should the document be revised as well? https://arrow.apache.org/docs/python/generated/pyarrow.FixedShapeTensorArray.html#pyarrow.FixedShapeTensorArray.from_numpy_ndarray -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
[PR] GH-45531: [Python] Add the `dim_names` argument to from_numpy_ndarray [arrow]
yyossy5 opened a new pull request, #46170: URL: https://github.com/apache/arrow/pull/46170 ### Rationale for this change The `FixedShapeTensorArray.from_numpy_ndarray` method did not pass `dim_names` to the `fixed_shape_tensor` constructor, which resulted in dimension names being lost when converting from a NumPy array. This change ensures that dimension names are properly preserved when constructing a tensor array from a NumPy ndarray. ### What changes are included in this PR? - Added an optional `dim_names` parameter to `FixedShapeTensorArray.from_numpy_ndarray`. - If provided, the `dim_names` are now passed to the `fixed_shape_tensor` constructor. ### Are these changes tested? - Existing tests pass, confirming no regressions to current functionality. - Additional unit tests have been added to verify that `dim_names` are correctly handled when specified. ### Are there any user-facing changes? - The method `FixedShapeTensorArray.from_numpy_ndarray` now accepts an optional `dim_names` argument. - This argument is optional, and the behavior remains unchanged when it is not provided. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
