Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]
AlenkaF commented on code in PR #38334: URL: https://github.com/apache/arrow/pull/38334#discussion_r2043839573 ## python/pyarrow/array.pxi: ## @@ -4220,23 +4220,46 @@ cdef class StructArray(Array): c_mask = c_mask_inverted_from_obj(mask, memory_pool) -arrays = [asarray(x) for x in arrays] -for arr in arrays: -c_array = pyarrow_unwrap_array(arr) -if c_array == nullptr: -raise TypeError(f"Expected Array, got {arr.__class__}") -c_arrays.push_back(c_array) if names is not None: for name in names: c_names.push_back(tobytes(name)) +arrays = [asarray(x) for x in arrays] else: +py_fields = [] for item in fields: if isinstance(item, tuple): py_field = field(*item) else: py_field = item +py_fields.append(py_field) c_fields.push_back(py_field.sp_field) +if len(py_fields) != len(arrays): +raise ValueError("Must pass same number of arrays as fields") +type_enforced_arrays = [] +for ary, field in zip(arrays, fields): +# Verify the data type of arrays which are already +# Arrow Arrays. Coerce the data type of anything else, +# since that other stuff doesn't have a certain type +# we can verify. +if isinstance(ary, (Array, ChunkedArray)): +if ary.type != field.type: +raise ValueError( +f"Field {field.name} is expected to have type {field.type}, but provided data is {ary.type}") +type_enforced_arrays.append(ary) Review Comment: I see this was added to preserve the original behavior in cases where the underlying child array has a different type than the one specified in the field. I agree that it's not entirely clear what the correct behavior should be, but I think we shouldn't change it here. Given that, would it make sense to remove the type check here and rely on the C++ layer to raise an error, as it currently does? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]
spenczar commented on PR #38334: URL: https://github.com/apache/arrow/pull/38334#issuecomment-2802156357 @raulcd Can you take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]
spenczar commented on PR #38334: URL: https://github.com/apache/arrow/pull/38334#issuecomment-2708082866 I've rebased on current main, just because this was so ancient that I was having trouble getting a dev environment back up. Please take another look, @raulcd . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]
spenczar commented on code in PR #38334: URL: https://github.com/apache/arrow/pull/38334#discussion_r1982253727 ## python/pyarrow/tests/test_array.py: ## @@ -710,9 +710,9 @@ def test_struct_from_arrays(): assert arr.to_pylist() == [] # Inconsistent fields -fa2 = pa.field("a", pa.int32()) -with pytest.raises(ValueError, match="int64 vs int32"): -pa.StructArray.from_arrays([a, b, c], fields=[fa2, fb, fc]) Review Comment: Right, I am coercing inputs to match the provided types specified by the user. I don't think it's obvious whether this is right or wrong; there is no specification here. We could require types to be identical (no coercion) if the input values are all Pyarrow Arrays. The implementation will just be a little more complex. Let me know what you prefer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]
raulcd commented on code in PR #38334: URL: https://github.com/apache/arrow/pull/38334#discussion_r1981271781 ## python/pyarrow/array.pxi: ## @@ -3204,23 +3204,34 @@ cdef class StructArray(Array): c_mask = c_mask_inverted_from_obj(mask, memory_pool) -arrays = [asarray(x) for x in arrays] -for arr in arrays: -c_array = pyarrow_unwrap_array(arr) -if c_array == nullptr: -raise TypeError(f"Expected Array, got {arr.__class__}") -c_arrays.push_back(c_array) if names is not None: for name in names: c_names.push_back(tobytes(name)) +arrays = [asarray(x) for x in arrays] else: +py_fields = [] for item in fields: if isinstance(item, tuple): py_field = field(*item) else: py_field = item +py_fields.append(py_field) c_fields.push_back(py_field.sp_field) +# This needs to be enumerate, not zip, in order to make +# sure that there is a field provided for every array. +try: +arrays = [asarray(ary, py_fields[i].type) + for i, ary in enumerate(arrays)] +except IndexError: +raise ValueError("Must pass same number of arrays as fields") Review Comment: It would be nice to get some tests for those exceptions ## python/pyarrow/tests/test_array.py: ## @@ -710,9 +710,9 @@ def test_struct_from_arrays(): assert arr.to_pylist() == [] # Inconsistent fields -fa2 = pa.field("a", pa.int32()) -with pytest.raises(ValueError, match="int64 vs int32"): -pa.StructArray.from_arrays([a, b, c], fields=[fa2, fb, fc]) Review Comment: I have to investigate further but think there's something wrong here. This use case should still raise an error as the underlying `a` child array is int64. I am not entirely sure if we are making a copy or what is happening, as I am not familiar with this area of the codebase, but this should raise an error AFAIU. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]
pitrou commented on PR #38334: URL: https://github.com/apache/arrow/pull/38334#issuecomment-2687447111 @raulcd Would you like to review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]
github-actions[bot] commented on PR #38334: URL: https://github.com/apache/arrow/pull/38334#issuecomment-1769341011 :warning: GitHub issue #38026 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org