Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2247176852
##
cpp/src/arrow/array/array_test.cc:
##
@@ -100,6 +100,28 @@ void CheckDictionaryNullCount(const
std::shared_ptr& dict_type,
ASSERT_EQ(arr->data()->MayHaveLogicalNulls(),
expected_may_have_logical_nulls);
}
+TEST_F(TestArray, TestValidateFullNullableList) {
+ auto f1 = field("f1", int32(), /*nullable=*/false);
+ auto type = list(f1);
+ auto array = ArrayFromJSON(type, "[[0, 1], null, [2, 5]]");
+ auto array_nested_null = ArrayFromJSON(type, "[[0, 1], [3, 4], [2, null]]");
+
+ ASSERT_RAISES(Invalid, array->ValidateFull());
+ ASSERT_RAISES(Invalid, array_nested_null->ValidateFull());
Review Comment:
@pitrou @lidavidm This unit test is failing, maybe due to the way the array
is formed from `ArrayFromJSON()`. Any guidance will help me a lot.
Thank you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2237433954
##
cpp/src/arrow/array/array_struct_test.cc:
##
@@ -171,6 +171,20 @@ TEST(StructArray, FromFields) {
ASSERT_RAISES(Invalid, res);
}
+TEST(StructArray, ValidateFullNullable) {
+ auto type = struct_({field("a", int32(), /*nullable=*/false),
+ field("b", utf8(), /*nullable=*/false),
+ field("c", list(boolean()), /*nullable=*/false)});
Review Comment:
Sure!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r223908
##
cpp/src/arrow/array/array_test.cc:
##
@@ -97,6 +97,27 @@ void CheckDictionaryNullCount(const
std::shared_ptr& dict_type,
ASSERT_EQ(arr->data()->MayHaveLogicalNulls(),
expected_may_have_logical_nulls);
}
+TEST_F(TestArray, TestValidateFullNullableList) {
+ auto f1 = field("f1", int32(), /*nullable=*/false);
+ auto type = list(f1);
+ auto array = ArrayFromJSON(type, "[[0, 1], null, [2, 5]]");
+ auto array_nested_null = ArrayFromJSON(type, "[[0, 1], [3, 4], [2, null]]");
+
+ ASSERT_RAISES(Invalid, array->ValidateFull());
+ ASSERT_RAISES(Invalid, array->ValidateFull());
+}
+
+TEST_F(TestArray, TestValidateFullNullableFixedSizeList) {
Review Comment:
Just to clarify, the type is a fixed_size_list of 2 `non-nullable` int32s
`(fixed_size_list(f0, 2))`. Please let me know if you'd prefer a comment or a
more explicit response.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2235117300
##
cpp/src/arrow/array/array_struct_test.cc:
##
@@ -171,6 +171,20 @@ TEST(StructArray, FromFields) {
ASSERT_RAISES(Invalid, res);
}
+TEST(StructArray, ValidateFullNullable) {
+ auto type = struct_({field("a", int32(), /*nullable=*/false),
+ field("b", utf8(), /*nullable=*/false),
+ field("c", list(boolean()), /*nullable=*/false)});
+
+ auto struct_arr = ArrayFromJSON(
+ type, R"([1, "a", [null, false]], [null, "bc", []], [2, null, null]])");
+ auto struct_arr_nonull = ArrayFromJSON(
+ type, R"([[1, "a"], [true, false], [6, "bc", []], [2, "bcj", [true,
true]]])");
Review Comment:
Thanks! You were right — the brackets were unbalanced. I’ve corrected the
JSON by wrapping all rows in one outer array.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2234849963
##
cpp/src/arrow/array/array_test.cc:
##
@@ -97,6 +97,27 @@ void CheckDictionaryNullCount(const
std::shared_ptr& dict_type,
ASSERT_EQ(arr->data()->MayHaveLogicalNulls(),
expected_may_have_logical_nulls);
}
+TEST_F(TestArray, TestValidateFullNullableList) {
+ auto f1 = field("f1", int32(), /*nullable=*/false);
+ auto type = list(f1);
+ auto array = ArrayFromJSON(type, "[[0, 1], null, [2, 5]]");
+ auto array_nested_null = ArrayFromJSON(type, "[[0, 1], [3, 4], [2, null]]");
+
+ ASSERT_RAISES(Invalid, array->ValidateFull());
+ ASSERT_RAISES(Invalid, array->ValidateFull());
Review Comment:
Resolved!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2234823448
##
cpp/src/arrow/array/array_union_test.cc:
##
@@ -70,6 +70,21 @@ TEST(TestUnionArray, TestSliceEquals) {
CheckUnion(batch->column(1));
}
+TEST(TestSparseUnionArray, TestValidateFullNullable) {
+ auto ty = sparse_union({field("ints", int64()), field("strs", utf8(),
false)}, {2, 7});
Review Comment:
Isn't this non-nullable field defined as `field("strs", utf8(), false)`?
Correct me if I'm wrong.
Thank you
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2234809919
##
cpp/src/arrow/array/array_union_test.cc:
##
@@ -70,6 +70,21 @@ TEST(TestUnionArray, TestSliceEquals) {
CheckUnion(batch->column(1));
}
+TEST(TestSparseUnionArray, TestValidateFullNullable) {
Review Comment:
Sure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2234808988
##
cpp/src/arrow/table.cc:
##
@@ -187,6 +187,9 @@ class SimpleTable : public Table {
ss << "Column " << i << ": " << st.message();
return st.WithMessage(ss.str());
}
+ if (schema_->field(i)->nullable() && col->null_count() > 0) {
Review Comment:
Sure, yes, now looking at the test failures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2234806258
##
cpp/src/arrow/array/validate.cc:
##
@@ -569,7 +570,8 @@ struct ValidateArrayImpl {
} else {
actual_null_count = 0;
}
-if (actual_null_count != data.null_count) {
+if (data.null_count != kUnknownNullCount &&
Review Comment:
Well this line ensures `actual_null_count == 0` when `nullable == false` .
##
cpp/src/arrow/array/validate.cc:
##
@@ -569,7 +570,8 @@ struct ValidateArrayImpl {
} else {
actual_null_count = 0;
}
-if (actual_null_count != data.null_count) {
+if (data.null_count != kUnknownNullCount &&
Review Comment:
@pitrou Well this line ensures `actual_null_count == 0` when `nullable ==
false` .
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
pitrou commented on code in PR #46129:
URL: https://github.com/apache/arrow/pull/46129#discussion_r2073780004
##
cpp/src/arrow/array/array_union_test.cc:
##
@@ -70,6 +70,21 @@ TEST(TestUnionArray, TestSliceEquals) {
CheckUnion(batch->column(1));
}
+TEST(TestSparseUnionArray, TestValidateFullNullable) {
+ auto ty = sparse_union({field("ints", int64()), field("strs", utf8(),
false)}, {2, 7});
Review Comment:
I don't see any non-nullable field here.
##
cpp/src/arrow/array/array_test.cc:
##
@@ -97,6 +97,27 @@ void CheckDictionaryNullCount(const
std::shared_ptr& dict_type,
ASSERT_EQ(arr->data()->MayHaveLogicalNulls(),
expected_may_have_logical_nulls);
}
+TEST_F(TestArray, TestValidateFullNullableList) {
+ auto f1 = field("f1", int32(), /*nullable=*/false);
+ auto type = list(f1);
+ auto array = ArrayFromJSON(type, "[[0, 1], null, [2, 5]]");
+ auto array_nested_null = ArrayFromJSON(type, "[[0, 1], [3, 4], [2, null]]");
+
+ ASSERT_RAISES(Invalid, array->ValidateFull());
+ ASSERT_RAISES(Invalid, array->ValidateFull());
+}
+
+TEST_F(TestArray, TestValidateFullNullableFixedSizeList) {
Review Comment:
I don't see a fixed-size list in this test, did I miss something?
##
cpp/src/arrow/array/array_struct_test.cc:
##
@@ -171,6 +171,20 @@ TEST(StructArray, FromFields) {
ASSERT_RAISES(Invalid, res);
}
+TEST(StructArray, ValidateFullNullable) {
+ auto type = struct_({field("a", int32(), /*nullable=*/false),
+ field("b", utf8(), /*nullable=*/false),
+ field("c", list(boolean()), /*nullable=*/false)});
Review Comment:
We should test more nested situations:
* non-nullable struct inside a list
* non-nullable struct inside a struct
##
cpp/src/arrow/array/validate.cc:
##
@@ -464,8 +465,8 @@ struct ValidateArrayImpl {
return data.buffers[index] != nullptr && data.buffers[index]->address() !=
0;
}
- Status RecurseInto(const ArrayData& related_data) {
-ValidateArrayImpl impl{related_data, full_validation};
+ Status RecurseInto(const ArrayData& related_data, bool nullable = true) {
Review Comment:
I would rather make the argument mandatory, so that we don't forget to pass
it.
```suggestion
Status RecurseInto(const ArrayData& related_data, bool nullable) {
```
##
cpp/src/arrow/array/validate.cc:
##
@@ -558,7 +559,7 @@ struct ValidateArrayImpl {
}
if (full_validation) {
- if (data.null_count != kUnknownNullCount) {
+ if (data.null_count != kUnknownNullCount || !nullable) {
Review Comment:
Ok, but where does it test that `actual_null_count` is 0 if `nullable` is
false?
##
cpp/src/arrow/table.cc:
##
@@ -187,6 +187,9 @@ class SimpleTable : public Table {
ss << "Column " << i << ": " << st.message();
return st.WithMessage(ss.str());
}
+ if (schema_->field(i)->nullable() && col->null_count() > 0) {
Review Comment:
You should check that it's _not_ nullable instead (did you notice the test
failures?).
##
cpp/src/arrow/array/array_struct_test.cc:
##
@@ -171,6 +171,20 @@ TEST(StructArray, FromFields) {
ASSERT_RAISES(Invalid, res);
}
+TEST(StructArray, ValidateFullNullable) {
+ auto type = struct_({field("a", int32(), /*nullable=*/false),
+ field("b", utf8(), /*nullable=*/false),
+ field("c", list(boolean()), /*nullable=*/false)});
+
+ auto struct_arr = ArrayFromJSON(
+ type, R"([1, "a", [null, false]], [null, "bc", []], [2, null, null]])");
+ auto struct_arr_nonull = ArrayFromJSON(
+ type, R"([[1, "a"], [true, false], [6, "bc", []], [2, "bcj", [true,
true]]])");
Review Comment:
Are you sure the JSON is right here? The brackets don't seem balanced.
##
cpp/src/arrow/array/array_union_test.cc:
##
@@ -70,6 +70,21 @@ TEST(TestUnionArray, TestSliceEquals) {
CheckUnion(batch->column(1));
}
+TEST(TestSparseUnionArray, TestValidateFullNullable) {
Review Comment:
Can you also test a dense union?
##
cpp/src/arrow/array/array_test.cc:
##
@@ -97,6 +97,27 @@ void CheckDictionaryNullCount(const
std::shared_ptr& dict_type,
ASSERT_EQ(arr->data()->MayHaveLogicalNulls(),
expected_may_have_logical_nulls);
}
+TEST_F(TestArray, TestValidateFullNullableList) {
+ auto f1 = field("f1", int32(), /*nullable=*/false);
+ auto type = list(f1);
+ auto array = ArrayFromJSON(type, "[[0, 1], null, [2, 5]]");
+ auto array_nested_null = ArrayFromJSON(type, "[[0, 1], [3, 4], [2, null]]");
+
+ ASSERT_RAISES(Invalid, array->ValidateFull());
+ ASSERT_RAISES(Invalid, array->ValidateFull());
Review Comment:
This line is repeated, did you mean something else?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscr
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
singh1203 commented on PR #46129: URL: https://github.com/apache/arrow/pull/46129#issuecomment-2801401769 cc: @pitrou @lidavidm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] GH-31387: [C++] Check nullability when validating fields on batches or struct arrays [arrow]
github-actions[bot] commented on PR #46129: URL: https://github.com/apache/arrow/pull/46129#issuecomment-2801394666 :warning: GitHub issue #31387 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
