Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]

2025-04-15 Thread via GitHub


AlenkaF commented on code in PR #38334:
URL: https://github.com/apache/arrow/pull/38334#discussion_r2043839573


##
python/pyarrow/array.pxi:
##
@@ -4220,23 +4220,46 @@ cdef class StructArray(Array):
 
 c_mask = c_mask_inverted_from_obj(mask, memory_pool)
 
-arrays = [asarray(x) for x in arrays]
-for arr in arrays:
-c_array = pyarrow_unwrap_array(arr)
-if c_array == nullptr:
-raise TypeError(f"Expected Array, got {arr.__class__}")
-c_arrays.push_back(c_array)
 if names is not None:
 for name in names:
 c_names.push_back(tobytes(name))
+arrays = [asarray(x) for x in arrays]
 else:
+py_fields = []
 for item in fields:
 if isinstance(item, tuple):
 py_field = field(*item)
 else:
 py_field = item
+py_fields.append(py_field)
 c_fields.push_back(py_field.sp_field)
 
+if len(py_fields) != len(arrays):
+raise ValueError("Must pass same number of arrays as fields")
+type_enforced_arrays = []
+for ary, field in zip(arrays, fields):
+# Verify the data type of arrays which are already
+# Arrow Arrays. Coerce the data type of anything else,
+# since that other stuff doesn't have a certain type
+# we can verify.
+if isinstance(ary, (Array, ChunkedArray)):
+if ary.type != field.type:
+raise ValueError(
+f"Field {field.name} is expected to have type 
{field.type}, but provided data is {ary.type}")
+type_enforced_arrays.append(ary)

Review Comment:
   I see this was added to preserve the original behavior in cases where the 
underlying child array has a different type than the one specified in the 
field. I agree that it's not entirely clear what the correct behavior should 
be, but I think we shouldn't change it here.
   
   Given that, would it make sense to remove the type check here and rely on 
the C++ layer to raise an error, as it currently does?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]

2025-04-14 Thread via GitHub


spenczar commented on PR #38334:
URL: https://github.com/apache/arrow/pull/38334#issuecomment-2802156357

   @raulcd Can you take another look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]

2025-03-07 Thread via GitHub


spenczar commented on PR #38334:
URL: https://github.com/apache/arrow/pull/38334#issuecomment-2708082866

   I've rebased on current main, just because this was so ancient that I was 
having trouble getting a dev environment back up. Please take another look, 
@raulcd .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]

2025-03-05 Thread via GitHub


spenczar commented on code in PR #38334:
URL: https://github.com/apache/arrow/pull/38334#discussion_r1982253727


##
python/pyarrow/tests/test_array.py:
##
@@ -710,9 +710,9 @@ def test_struct_from_arrays():
 assert arr.to_pylist() == []
 
 # Inconsistent fields
-fa2 = pa.field("a", pa.int32())
-with pytest.raises(ValueError, match="int64 vs int32"):
-pa.StructArray.from_arrays([a, b, c], fields=[fa2, fb, fc])

Review Comment:
   Right, I am coercing inputs to match the provided types specified by the 
user. I don't think it's obvious whether this is right or wrong; there is no 
specification here.
   
   We could require types to be identical (no coercion) if the input values are 
all Pyarrow Arrays. The implementation will just be a little more complex. Let 
me know what you prefer. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]

2025-03-05 Thread via GitHub


raulcd commented on code in PR #38334:
URL: https://github.com/apache/arrow/pull/38334#discussion_r1981271781


##
python/pyarrow/array.pxi:
##
@@ -3204,23 +3204,34 @@ cdef class StructArray(Array):
 
 c_mask = c_mask_inverted_from_obj(mask, memory_pool)
 
-arrays = [asarray(x) for x in arrays]
-for arr in arrays:
-c_array = pyarrow_unwrap_array(arr)
-if c_array == nullptr:
-raise TypeError(f"Expected Array, got {arr.__class__}")
-c_arrays.push_back(c_array)
 if names is not None:
 for name in names:
 c_names.push_back(tobytes(name))
+arrays = [asarray(x) for x in arrays]
 else:
+py_fields = []
 for item in fields:
 if isinstance(item, tuple):
 py_field = field(*item)
 else:
 py_field = item
+py_fields.append(py_field)
 c_fields.push_back(py_field.sp_field)
 
+# This needs to be enumerate, not zip, in order to make
+# sure that there is a field provided for every array.
+try:
+arrays = [asarray(ary, py_fields[i].type)
+  for i, ary in enumerate(arrays)]
+except IndexError:
+raise ValueError("Must pass same number of arrays as fields")

Review Comment:
   It would be nice to get some tests for those exceptions



##
python/pyarrow/tests/test_array.py:
##
@@ -710,9 +710,9 @@ def test_struct_from_arrays():
 assert arr.to_pylist() == []
 
 # Inconsistent fields
-fa2 = pa.field("a", pa.int32())
-with pytest.raises(ValueError, match="int64 vs int32"):
-pa.StructArray.from_arrays([a, b, c], fields=[fa2, fb, fc])

Review Comment:
   I have to investigate further but think there's something wrong here. This 
use case should still raise an error as the underlying `a` child array is 
int64. I am not entirely sure if we are making a copy or what is happening, as 
I am not familiar with this area of the codebase, but this should raise an 
error AFAIU.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]

2025-02-27 Thread via GitHub


pitrou commented on PR #38334:
URL: https://github.com/apache/arrow/pull/38334#issuecomment-2687447111

   @raulcd Would you like to review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] GH-38026: [Python] Use provided field types in StructArray.from_arrays [arrow]

2023-10-18 Thread via GitHub


github-actions[bot] commented on PR #38334:
URL: https://github.com/apache/arrow/pull/38334#issuecomment-1769341011

   :warning: GitHub issue #38026 **has been automatically assigned in GitHub** 
to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org