Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-25 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2231475751


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -54,6 +55,39 @@ struct NotEqual {
   }
 };
 
+struct ListEqual {
+  template 
+  static T Call(KernelContext*, const Arg0& left, const Arg1& right, Status*) {
+static_assert(std::is_same::value && std::is_same::value, "");
+
+if (left.length != right.length) {
+  return false;
+} else {
+  RangeDataEqualsImpl range_comparer{
+  EqualOptions::Defaults(), false, left, right, 0, 0, 1,

Review Comment:
   After some more debugging I think this is OK, since the scalar unboxing will 
also set the child ArraySpan that is returned to already have the appropriate 
offsets. From that perspective, the ArrayIterator and scalar unboxing work the 
same way.
   
   I've added a comment to clarify, but of course let me know of any feedback



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-22 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r787497


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -54,6 +55,39 @@ struct NotEqual {
   }
 };
 
+struct ListEqual {
+  template 
+  static T Call(KernelContext*, const Arg0& left, const Arg1& right, Status*) {
+static_assert(std::is_same::value && std::is_same::value, "");
+
+if (left.length != right.length) {
+  return false;
+} else {
+  RangeDataEqualsImpl range_comparer{
+  EqualOptions::Defaults(), false, left, right, 0, 0, 1,

Review Comment:
   I'm starting to question if the ArrayIterator approach will really work 
here. The issue with this code in its current state is that it holds an 
invariant the its ArraySpan argument being provided is sliced so that the 
offset is 0; I think that might be hard to read and worry about how 
generalizable it will be.
   
   There's also an issue with the child array properly signaling its validity 
bitmap to `RangeDataEqualsImpl::Compare`, which is configured to read the 
bitmap of the parent. 
   
   Researching a bit more how the other nested types are handled...stay tuned
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-18 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2216113095


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,849 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+ARROW_EXPORT bool CompareArrayRanges(const ArrayData& left, const ArrayData& 
right,

Review Comment:
   Shouldn't need the ARROW_EXPORT here - only temporarily 

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-17 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2213670106


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -54,6 +55,39 @@ struct NotEqual {
   }
 };
 
+struct ListEqual {
+  template 
+  static T Call(KernelContext*, const Arg0& left, const Arg1& right, Status*) {
+static_assert(std::is_same::value && std::is_same::value, "");
+
+if (left.length != right.length) {
+  return false;
+} else {
+  RangeDataEqualsImpl range_comparer{
+  EqualOptions::Defaults(), false, left, right, 0, 0, 1,

Review Comment:
   Great thanks. Still working on my C++ and Arrow skills :-) this has been a 
good challenge on that front



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-17 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2213556880


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -54,6 +55,39 @@ struct NotEqual {
   }
 };
 
+struct ListEqual {
+  template 
+  static T Call(KernelContext*, const Arg0& left, const Arg1& right, Status*) {
+static_assert(std::is_same::value && std::is_same::value, "");
+
+if (left.length != right.length) {
+  return false;
+} else {
+  RangeDataEqualsImpl range_comparer{
+  EqualOptions::Defaults(), false, left, right, 0, 0, 1,

Review Comment:
   I recommend that you read existing code accessing list arrays to convince 
yourself of this :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-17 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2213555919


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -54,6 +55,39 @@ struct NotEqual {
   }
 };
 
+struct ListEqual {
+  template 
+  static T Call(KernelContext*, const Arg0& left, const Arg1& right, Status*) {
+static_assert(std::is_same::value && std::is_same::value, "");
+
+if (left.length != right.length) {
+  return false;
+} else {
+  RangeDataEqualsImpl range_comparer{
+  EqualOptions::Defaults(), false, left, right, 0, 0, 1,

Review Comment:
   The parent offset affects reading into the list *offsets* buffer, but it 
should not be added to the offsets into the child array (otherwise the parent 
offset would be counted twice).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-17 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2213535155


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -54,6 +55,39 @@ struct NotEqual {
   }
 };
 
+struct ListEqual {
+  template 
+  static T Call(KernelContext*, const Arg0& left, const Arg1& right, Status*) {
+static_assert(std::is_same::value && std::is_same::value, "");
+
+if (left.length != right.length) {
+  return false;
+} else {
+  RangeDataEqualsImpl range_comparer{
+  EqualOptions::Defaults(), false, left, right, 0, 0, 1,

Review Comment:
   I'm doubtful that hard-coding the offsets here at zero is correct, although 
passing left.offset and right.offset was triggering heap overflow.
   
   Is it possible for both the parent and child arrays to have a legitimate 
offset, or does the list type manage that just through its child array?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-15 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2207722732


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+   

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-15 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2207713866


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+   

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-14 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2205336452


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+  

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-14 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2205336452


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+  

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-14 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2205077063


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+  

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-14 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2205251653


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+  

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-14 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2205251653


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+  

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-14 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2205077063


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+  

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-11 Thread via GitHub


WillAyd commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-3063710221

   Still need to figure out the binary view test regressions, but in the 
meantime I am happy to report that performance looks much improved:
   
   ```python
   In [3]: >>> arr1 = pa.array([list("abc"), list("def"), list("xyz")] * 
1_000_000)
  ...: >>> arr2 = pa.array([list("abc"), list("def"), list("xyzz")] * 
1_000_000)
  ...: >>> %timeit pc.equal(arr1, arr2)
   48.2 ms ± 213 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-11 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2201547199


##
cpp/src/arrow/compare_internal.h:
##
@@ -0,0 +1,966 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include 
+
+#include "arrow/array/array_dict.h"
+#include "arrow/array/data.h"
+#include "arrow/array/diff.h"
+#include "arrow/compare.h"
+#include "arrow/scalar.h"
+#include "arrow/type_traits.h"
+#include "arrow/util/binary_view_util.h"
+#include "arrow/util/bit_run_reader.h"
+#include "arrow/util/bitmap_ops.h"
+#include "arrow/util/bitmap_reader.h"
+#include "arrow/util/float16.h"
+#include "arrow/util/logging_internal.h"
+#include "arrow/util/memory_internal.h"
+#include "arrow/util/ree_util.h"
+#include "arrow/visit_scalar_inline.h"
+#include "arrow/visit_type_inline.h"
+
+namespace arrow {
+
+using internal::BitmapEquals;
+using internal::BitmapReader;
+using internal::BitmapUInt64Reader;
+using internal::checked_cast;
+using internal::OptionalBitmapEquals;
+using util::Float16;
+
+// TODO also handle HALF_FLOAT NaNs
+
+template 
+struct FloatingEqualityFlags {
+  static constexpr bool approximate = Approximate;
+  static constexpr bool nans_equal = NansEqual;
+  static constexpr bool signed_zeros_equal = SignedZerosEqual;
+};
+
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(T x, T y) const {
+if (x == y) {
+  return Flags::signed_zeros_equal || (std::signbit(x) == std::signbit(y));
+}
+if (Flags::nans_equal && std::isnan(x) && std::isnan(y)) {
+  return true;
+}
+if (Flags::approximate && (fabs(x - y) <= epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const T epsilon;
+};
+
+// For half-float equality.
+template 
+struct FloatingEquality {
+  explicit FloatingEquality(const EqualOptions& options)
+  : epsilon(static_cast(options.atol())) {}
+
+  bool operator()(uint16_t x, uint16_t y) const {
+Float16 f_x = Float16::FromBits(x);
+Float16 f_y = Float16::FromBits(y);
+if (x == y) {
+  return Flags::signed_zeros_equal || (f_x.signbit() == f_y.signbit());
+}
+if (Flags::nans_equal && f_x.is_nan() && f_y.is_nan()) {
+  return true;
+}
+if (Flags::approximate && (fabs(f_x.ToFloat() - f_y.ToFloat()) <= 
epsilon)) {
+  return true;
+}
+return false;
+  }
+
+  const float epsilon;
+};
+
+template 
+struct FloatingEqualityDispatcher {
+  const EqualOptions& options;
+  bool floating_approximate;
+  Visitor&& visit;
+
+  template 
+  void DispatchL3() {
+if (options.signed_zeros_equal()) {
+  visit(FloatingEquality>{
+  options});
+} else {
+  visit(FloatingEquality>{
+  options});
+}
+  }
+
+  template 
+  void DispatchL2() {
+if (options.nans_equal()) {
+  DispatchL3();
+} else {
+  DispatchL3();
+}
+  }
+
+  void Dispatch() {
+if (floating_approximate) {
+  DispatchL2();
+} else {
+  DispatchL2();
+}
+  }
+};
+
+// Call `visit(equality_func)` where `equality_func` has the signature 
`bool(T, T)`
+// and returns true if the two values compare equal.
+template 
+void VisitFloatingEquality(const EqualOptions& options, bool 
floating_approximate,
+   Visitor&& visit) {
+  FloatingEqualityDispatcher{options, floating_approximate,
+ std::forward(visit)}
+  .Dispatch();
+}
+
+inline bool IdentityImpliesEqualityNansNotEqual(const DataType& type) {
+  if (type.id() == Type::FLOAT || type.id() == Type::DOUBLE) {
+return false;
+  }
+  for (const auto& child : type.fields()) {
+if (!IdentityImpliesEqualityNansNotEqual(*child->type())) {
+  return false;
+}
+  }
+  return true;
+}
+
+inline bool IdentityImpliesEquality(const DataType& type, const EqualOptions& 
options) {
+  if (options.nans_equal()) {
+return true;
+  }
+  return IdentityImpliesEqualityNansNotEqual(type);
+}
+
+bool CompareArrayRanges(const ArrayData& left, const ArrayData& right,
+int64_t left_start_idx, int64_t left_end_idx,
+  

Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-07 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2190793693


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -349,6 +356,31 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  const offset_type* offsets;
+  offset_type cur_offset;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr)
+  : arr(arr),
+offsets(reinterpret_cast(arr.buffers[1].data) + 
arr.offset),
+cur_offset(offsets[0]),
+position(0) {}
+
+  ArraySpan operator()() {
+offset_type next_offset = offsets[++position];
+const offset_type length = next_offset - cur_offset;
+ArraySpan new_span{arr};
+new_span.SetSlice(cur_offset, length);

Review Comment:
   I think this is a bit confused. This is creating a list ArraySpan with 
`length` list elements in it.
   
   If you want to use `ArrayIterator`, I think it should create an ArraySpan of 
the child array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-07 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2190795023


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -425,6 +457,14 @@ struct UnboxScalar> {
   }
 };
 
+template 
+struct UnboxScalar> {
+  using T = ArraySpan;
+  using ScalarT = typename TypeTraits::ScalarType;
+
+  static T Unbox(const Scalar& val) { return T{checked_cast(val)}; }

Review Comment:
   Similarly, if you want use `UnboxScalar` for list scalars, it should return 
a `ArraySpan` of the embedded value (which is already an array of the list's 
child type).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-07 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2190785054


##
cpp/src/arrow/array/data.cc:
##
@@ -495,13 +495,15 @@ void ArraySpan::FillFromScalar(const Scalar& value) {
 if (type_id == Type::LIST) {
   const auto& list_scalar = checked_cast(value);
   this->buffers[1] = OffsetsForScalar(list_scalar.scratch_space_, 
sizeof(int32_t));
+  this->length = list_scalar.value->length();

Review Comment:
   > I think this is causing issues elsewhere for now, but before diving into 
them further I want to check if we have an invariant that the length of any 
scalar is always equal to 1.
   
   I think that's the only thing that makes sense, yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-07 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2190747063


##
cpp/src/arrow/array/data.cc:
##
@@ -495,13 +495,15 @@ void ArraySpan::FillFromScalar(const Scalar& value) {
 if (type_id == Type::LIST) {
   const auto& list_scalar = checked_cast(value);
   this->buffers[1] = OffsetsForScalar(list_scalar.scratch_space_, 
sizeof(int32_t));
+  this->length = list_scalar.value->length();

Review Comment:
   I think this is causing issues elsewhere for now, but before diving into 
them further I want to check if we have an invariant that the length of any 
scalar is always equal to 1.
   
   The current RangeDataEqualsImpl checks the length of its arguments; in the 
case of Array versus Array iteration, the ArraySpans that are generated 
properly report the length of each child element. However, when comparing a 
Scalar to an Array, the ArraySpan generated by the scalar reports its length as 
1 (hard coded further up in this function) so that same type of comparison does 
not work. 
   
   Not sure if I should just change the RangeDataEqualsImpl to avoid the length 
check in that case, or if we consider it a but that the length of ListScalar 
elements is always 1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-07 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2190747063


##
cpp/src/arrow/array/data.cc:
##
@@ -495,13 +495,15 @@ void ArraySpan::FillFromScalar(const Scalar& value) {
 if (type_id == Type::LIST) {
   const auto& list_scalar = checked_cast(value);
   this->buffers[1] = OffsetsForScalar(list_scalar.scratch_space_, 
sizeof(int32_t));
+  this->length = list_scalar.value->length();

Review Comment:
   I think this is causing issues elsewhere for now, but before diving into 
them further I want to check if we have an invariant that the length of any 
scalar is always equal to 1.
   
   The current RangeDataEqualsImpl checks the length of its arguments; in the 
case of Array versus Array iteration, the ArraySpans that are generated 
properly report the length of each child element. However, when comparing a 
Scalar to an Array, the ArraySpan generated by the scalar reports its length as 
1 (hard coded further up in this function) so that same type of comparison does 
not work. 
   
   Not sure if I should just change the RangeDataEqualsImpl to avoid the length 
check in that case, or if we consider it a bug that the length of ListScalar 
elements is always 1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-02 Thread via GitHub


pitrou commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-3028351420

   Ideally we shouldn't template the methods to accept both `ArrayData` and 
`ArraySpan`. It would be better (if possible) to migrate `RangeDataEqualsImpl` 
to use `ArraySpan` internally, and let the callers do the `ArrayData` -> 
`ArraySpan` conversion where required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-02 Thread via GitHub


WillAyd commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-3028343713

   OK taking a look there. The main difference I noticed is that `ArraySpan` 
exposes its `buffers` class member as an `BufferSpan` array whereas `ArrayData` 
exposes it as a vector of shared pointers to a buffer. Additionally, the 
`ArraySpan` has a `dictionary` class method whereas `ArrayData` has a 
`dictionary` class member that is a shared pointer.
   
   Do you think its worth trying to align the signatures of these elements to 
make templating easier, or should I perhaps use templating to disable the 
`Visit` methods in `RangeDataEqualsImpl` where those are currently being 
accessed? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-02 Thread via GitHub


WillAyd commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-3027916509

   OK I am still seeing some runtime performance after refactoring the 
RangeDataEqualsImpl class to `compare_internal.h` and creating custom functors 
that call that directly per element, although its still a good deal behind 
primitive performance:
   
   ```python
   >>> arr1 = pa.array([list("abc"), list("def"), list("xyz")] * 1_000_000)
   >>> arr2 = pa.array([list("abc"), list("def"), list("xyzz")] * 1_000_000)
   >>> %timeit pc.equal(arr1, arr2)
   336 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
   ```
   
   I wonder if the ArrayIterator returning a `shared_ptr` is the 
next bottleneck?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-02 Thread via GitHub


pitrou commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-3027961571

   > I wonder if the ArrayIterator returning a shared_ptr is the 
next bottleneck?
   
   Well, if we want an `ArrayIterator` for list arrays, it should not store a 
`ArrayData` but a `ArraySpan`.
   This implies that `RangeDataEqualsImpl` should be modified to take 
`ArraySpan` as well.
   
   Even better in the future would be to compare entire ranges of non-null list 
values at once. For example, for `[[1, 2, 3], [4, 5], null, [6, 7]]`, compare 
the entire range `[1, 2, 3, 4, 5]` instead of two separate ranges.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-02 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2179335918


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   I see these comments at the beginning of the class:
   ```c++
   class RangeDataEqualsImpl {
public:
 // PRE-CONDITIONS:
 // - the types are equal
 // - the ranges are in bounds
   ```
   
   We should probably add that the two ranges must have the same length.
   
   For example this is how list comparison is currently implemented, where it 
is first checked that each pair of list elements have the same length:
   ```c++
 template 
 void CompareWithOffsets(int offsets_buffer_index, CompareRanges&& 
compare_ranges) {
   const offset_type* left_offsets =
   left_.GetValues(offsets_buffer_index) + left_start_idx_;
   const offset_type* right_offsets =
   right_.GetValues(offsets_buffer_index) + 
right_start_idx_;
   
   const auto compare_runs = [&](int64_t i, int64_t length) {
 for (int64_t j = i; j < i + length; ++j) {
   if (left_offsets[j + 1] - left_offsets[j] !=
   right_offsets[j + 1] - right_offsets[j]) {
 return false;
   }
 }
 if (!compare_ranges(left_offsets[i], right_offsets[i],
 left_offsets[i + length] - left_offsets[i])) {
   return false;
 }
 return true;
   };
   
   VisitValidRuns(compare_runs);
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-02 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2179335918


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   I see these comments at the beginning of the class:
   ```c++
   class RangeDataEqualsImpl {
public:
 // PRE-CONDITIONS:
 // - the types are equal
 // - the ranges are in bounds
   ```
   
   We should probably add that the two ranges must have the same length.
   
   For example this is how list comparison is currently implemented:
   ```c++
 template 
 void CompareWithOffsets(int offsets_buffer_index, CompareRanges&& 
compare_ranges) {
   const offset_type* left_offsets =
   left_.GetValues(offsets_buffer_index) + left_start_idx_;
   const offset_type* right_offsets =
   right_.GetValues(offsets_buffer_index) + 
right_start_idx_;
   
   const auto compare_runs = [&](int64_t i, int64_t length) {
 for (int64_t j = i; j < i + length; ++j) {
   if (left_offsets[j + 1] - left_offsets[j] !=
   right_offsets[j + 1] - right_offsets[j]) {
 return false;
   }
 }
 if (!compare_ranges(left_offsets[i], right_offsets[i],
 left_offsets[i + length] - left_offsets[i])) {
   return false;
 }
 return true;
   };
   
   VisitValidRuns(compare_runs);
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-07-01 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r2178600735


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   OK sorry for the long delay in getting back to this. I think I have a 
structure working but I'm unclear on if there's a bug with the 
RangeDataEqualsImpl implementation or if I am misunderstanding its purpose.
   
   As an example, I am trying to compare two list arrays, where the first is:
   
   ```
   [4, 5, 6]
   ```
   
   and the second is:
   
   ```
   [4, 5]
   ```
   
   When calling RangeDataEqualsImpl::Compare with these two values, I am 
getting back `true`. The current Compare implementation looks like:
   
   ```cpp
 bool Compare() {
   // Compare null bitmaps
   if (left_start_idx_ == 0 && right_start_idx_ == 0 && range_length_ == 
left_.length &&
   range_length_ == right_.length) {
 // If we're comparing entire arrays, we can first compare the cached 
null counts
 if (left_.GetNullCount() != right_.GetNullCount()) {
   return false;
 }
   }
   if (!OptionalBitmapEquals(left_.buffers[0], left_.offset + 
left_start_idx_,
 right_.buffers[0], right_.offset + 
right_start_idx_,
 range_length_)) {
 return false;
   }
   // Compare values
   return CompareWithType(*left_.type);
 }
   ```
   
   So the first branch is entirely ignored because the length of the two 
`ArrayData` instances are not equal. Ultimately this falls down to the 
`CompareWithType(*left_.type)` call at the bottom of the method, but the 
`ArrayData->type` member reports back as INT32.
   
   So is this a problem with the implementation of `Compare` where it should be 
short circuiting when the ArrayData lengths are not equal? And/or is the fact 
that the `ArrayData->type` member is coming back as INT32 and not as a List 
element the problem? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-03-08 Thread via GitHub


WillAyd commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-2708674958

   Sorry for the slow turnaround - hope this is heading in the right direction. 
Still nowhere close to the performance of more primitive types but about 2-3 
times as fast as the previous implementation
   
   ```python
   >>> arr1 = pa.array([list("abc"), list("def"), list("xyz")] * 1_000_000)
   >>> arr2 = pa.array([list("abc"), list("def"), list("xyzz")] * 1_000_000)
   >>> %timeit pc.equal(arr1, arr2)
   463 ms ± 5.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-20 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1963635451


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,26 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr) : arr(arr), position(0) {}
+
+  T operator()() {
+const auto array_ptr = arr.ToArray();

Review Comment:
   The conversion will be expensive regardless of whether you slice before or 
afterwards, so ideally we should avoid the conversion entirely.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-20 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1963640068


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,26 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr) : arr(arr), position(0) {}
+
+  T operator()() {
+const auto array_ptr = arr.ToArray();

Review Comment:
   In other words, we should consider exposing a `ArrayRangeEquals` variant 
that takes `ArraySpan` inputs instead of `Array` (the underlying implementation 
can be converted to `ArraySpan` entirely).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-19 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1962263513


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -390,6 +417,12 @@ struct UnboxScalar> {
   }
 };
 
+template 
+struct UnboxScalar> {

Review Comment:
   I can look at adding that to the compute registry



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-19 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1962264419


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,26 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr) : arr(arr), position(0) {}
+
+  T operator()() {
+const auto array_ptr = arr.ToArray();

Review Comment:
   Sorry for the late reply - I somehow missed this comment. So the 
expectationn is that the ArraySpan -> Array conversion is expensive here and 
getting a slice first before making that conversion should help with 
performance, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-06 Thread via GitHub


mapleFU commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1944402094


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -390,6 +417,12 @@ struct UnboxScalar> {
   }
 };
 
+template 
+struct UnboxScalar> {

Review Comment:
   So fixed_size_list is also declared as scalar, but not being registered in 
compute? ( It's ok to me, just to make sure this)



##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,26 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr) : arr(arr), position(0) {}
+
+  T operator()() {
+const auto array_ptr = arr.ToArray();

Review Comment:
   How about:
   1. Get offset, length
   2. Subslice the value array
   3. Build `ListScalar` / `LargeListScalar` from the child array?
   
   Or materialize the child array, and using the sub array. `arr.ToArray()` 
every call is too expansive?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-06 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1944381462


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   > So AFAICT the `RangeDataEqualsImpl` returns a scalar bool value, rather 
than an array of booleans like we would need in the result here.
   
   That's right, so it would need to be called once for each list element 
(which is admittedly non optimal, but probably better than using `GetScalar` 
anyway?).
   
   > That class is also private to the `compare.cc` module and doesn't expose 
any suitable entrypoint in `compare.h` that I think would work here.
   
   Well, we could add a suitable entrypoint in `compare_internal.h` if that's 
useful.
   
   Another possible approach would be to leverage the comparison kernel for the 
child type, but that would probably be even more involved. So that's up to how 
much work you want to put into this :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-05 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1943228166


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   Thanks again for the guidance and patience here! Trying to wrap my head 
around the structure of the compute modules



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-02-05 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1943227326


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   OK took a closer look at this. So AFAICT the `RangeDataEqualsImpl` returns a 
scalar bool value, rather than an array of booleans like we would need in the 
result here. That class is also private to the `compare.cc` module and doesn't 
expose any suitable entrypoint in `compare.h` that I think would work here.
   
   Are you thinking we should refactor the `RangeDataEqualsImpl` to support 
vector functions and move it to make it accessible to the compute module, or do 
you think we should just create a dedicated class drawing some inspiration from 
it in `scalar_compute.cc`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-29 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1934023851


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   > With what you are suggesting, I'm guessing I should be creating a new 
registry function along with that like `RegisterRangeComparison` right?
   
   I think we can avoid that by directly calling into `RangeDataEqualsImpl`.
   
   > Also I'm guessing the RangeDataEqualsImpl is supposed to work when 
comparing two arrays, but not when comparing an array with a scalar
   
   A list scalar's value is actually an array, so that should not necessarily 
be a problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-28 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1932841503


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   Also I'm guessing the `RangeDataEqualsImpl` is supposed to work when 
comparing two arrays, but not when comparing an array with a scalar
   
   FWIW though I did benchmark the current implementation and it was definitely 
slow. Seemed about 1000x slower than an equivalent comparison using primitive 
types



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-28 Thread via GitHub


WillAyd commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-2619201417

   The current (rather slow) implementation just does an elementwise compare, 
dispatching to the logical list scalar type. Therefore, since:
   
   ```python
   >>> l1 = pa.scalar([], type=pa.list_(pa.int32()))
   >>> l2 = pa.scalar([], type=pa.list_(pa.int32()))
   >>> l1 == l2
   True
   ```
   
   Wrapping that in an array does not change the behavior:
   
   ```python
   >>> arr1 = pa.array([l1])
   >>> arr2 = pa.array([l2])
   >>> pc.equal(arr1, arr2)
   
   [
 true
   ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-27 Thread via GitHub


jorisvandenbossche commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-2618089661

   I currently have no time to review this in depth, but API-wise one remark: 
right now (for primitive arrays), nulls propagate in an operation like `equal`. 
   So how do they behave for nested typed, i.e. what if there is a null in a 
list element. Does that propagate as well (and does it make the comparison of 
the full list element null), or do we then consider a list element with nulls 
in the same location as equal?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-27 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1930980642


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   Thanks for the heads up - I will give that a look. So I see all of the 
functions right now in the compare module are registered via 
`RegisterScalarComparison`. With what you are suggesting, I'm guessing I should 
be creating a new registry function along with that like 
`RegisterRangeComparison` right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-23 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1927417430


##
cpp/src/arrow/compute/kernels/scalar_compare.cc:
##
@@ -445,6 +445,14 @@ std::shared_ptr 
MakeCompareFunction(std::string name, FunctionDo
 DCHECK_OK(func->AddKernel({ty, ty}, boolean(), std::move(exec)));
   }
 
+  if constexpr (std::is_same_v || std::is_same_v) {
+for (const auto id : {Type::LIST, Type::LARGE_LIST}) {
+  auto exec = GenerateList(id);

Review Comment:
   Another approach with perhaps a better performance potential would be to 
leverage the existing `RangeDataEqualsImpl` in `arrow/compare.cc`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-23 Thread via GitHub


pitrou commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1927415730


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,26 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr) : arr(arr), position(0) {}
+
+  T operator()() {
+const auto array_ptr = arr.ToArray();

Review Comment:
   This is going to be slow, so we probably want to avoid this IMHO.
   
   You may want to run a crude benchmark from Python to check this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-23 Thread via GitHub


WillAyd commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-2609920963

   @pitrou @jorisvandenbossche would either of you be able to take a look here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-21 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1924343441


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,26 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr) : arr(arr), position(0) {}
+
+  T operator()() {
+const auto array_ptr = arr.ToArray();

Review Comment:
   The alternative to calling `ToArray` with the cast would be to implement 
something like `value_slice` on the `ArraySpan` directly, although I'm not sure 
if the `ArraySpan` is supposed to return anything but pointers to primitives 
(as is currently implemeted)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-17 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r1920484539


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,47 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+  using offset_type = typename Type::offset_type;
+
+  const ArraySpan& arr;
+  const offset_type* offsets;
+  offset_type cur_offset;
+  const ArraySpan& values;
+  const uint8_t* data;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr)
+  : arr(arr),
+offsets(reinterpret_cast(arr.buffers[1].data)),
+cur_offset(offsets[arr.offset]),
+values(arr.child_data[0]),
+position(arr.offset) {}
+
+  T operator()() {
+offset_type next_offset = offsets[++position];
+const auto len = next_offset - cur_offset;
+const auto null_count = values.null_count;
+const std::shared_ptr nulls_buffer =
+null_count > 0 ? *values.buffers[0].owner : nullptr;
+std::vector> bufs = {nulls_buffer, 
*values.buffers[1].owner};
+const auto child_offset = values.offset;
+
+// TODO: do not hard code child type. also need to be aware of 
non-primitive children
+const auto array_data = ArrayData::Make(int32(), len, std::move(bufs), 
null_count,

Review Comment:
   Definitely do not want to hard code the data type here but I'm not sure what 
facilities can help select from the ListArray generically - do these exist 
already or would they need to be built out in this function?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-15 Thread via GitHub


WillAyd commented on code in PR #45272:
URL: https://github.com/apache/arrow/pull/45272#discussion_r191648


##
cpp/src/arrow/compute/kernels/codegen_internal.h:
##
@@ -322,6 +329,32 @@ struct ArrayIterator> {
   }
 };
 
+template 
+struct ArrayIterator> {
+  using T = typename TypeTraits::ScalarType;
+  using ArrayT = typename TypeTraits::ArrayType;
+
+  const ArraySpan& arr;
+  const char* data;
+  const int32_t width;
+  int64_t position;
+
+  explicit ArrayIterator(const ArraySpan& arr)
+  : arr(arr),
+data(reinterpret_cast(arr.buffers[1].data)),
+width(arr.type->byte_width()),
+position(arr.offset) {}
+
+  T operator()() {
+// TODO: how cann we avoid the ToArray call

Review Comment:
   I think this is the remaining piece that needs to be fixed. I noticed the 
test failures appear when taking slices of the Arrays, and I think that ends up 
flattening potentially chunked arrays into a single array given this `ToArray` 
call.
   
   Is there some other facility already in the code base to help get a 
ListScalar from an ArraySpan that I may be overlooking? If not, should that be 
implemented here directly, or somewhere within the ArraySpan class? 
   
   AFAICT the ArraySpan class only returns pointers to physical, primitive 
types at the moment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] GH-45167: [C++] Implement Compute Equals for List Types [arrow]

2025-01-15 Thread via GitHub


github-actions[bot] commented on PR #45272:
URL: https://github.com/apache/arrow/pull/45272#issuecomment-2593412585

   :warning: GitHub issue #45167 **has been automatically assigned in GitHub** 
to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]