Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-30 Thread via GitHub


kazuyukitanimura commented on PR #1602:
URL: 
https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2923351378

   Merged Thank you @kazantsev-maksim @parthchandra @mbutrovich 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-30 Thread via GitHub


kazuyukitanimura merged PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-30 Thread via GitHub


kazantsev-maksim commented on PR #1602:
URL: 
https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2922907576

   Thank you for the feedback! @kazuyukitanimura @parthchandra 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-28 Thread via GitHub


parthchandra commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112886589


##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("bitwise_count - min/max values") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
+val table = "bitwise_count_test"
+withTable(table) {
+  sql(s"create table $table(col1 long, col2 int, col3 short, col4 
byte) using parquet")
+  sql(s"insert into $table values(, , 17, 7)")
+  sql(
+s"insert into $table values(${Long.MaxValue}, ${Int.MaxValue}, 
${Short.MaxValue}, ${Byte.MaxValue})")
+  sql(
+s"insert into $table values(${Long.MinValue}, ${Int.MinValue}, 
${Short.MinValue}, ${Byte.MinValue})")
+
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col1) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col2) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col3) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col4) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(true) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(false) FROM 
$table"))
+}
+  }
+}
+  }
+
+  test("bitwise_count - random values (spark gen)") {
+withTempDir { dir =>
+  val path = new Path(dir.toURI.toString, "test.parquet")
+  val filename = path.toString
+  val random = new Random(42)
+  withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
+ParquetGenerator.makeParquetFile(
+  random,
+  spark,
+  filename,
+  10,
+  DataGenOptions(
+allowNull = true,
+generateNegativeZero = true,
+generateArray = false,
+generateStruct = false,
+generateMap = false))
+  }
+  val table = spark.read.parquet(filename)
+  val df =
+table.selectExpr("bit_count(c1)", "bit_count(c2)", "bit_count(c3)", 
"bit_count(c4)")
+
+  checkSparkAnswerAndOperator(df)
+}
+  }
+
+  test("bitwise_count - random values (native parquet gen)") {

Review Comment:
   Oh, I must have missed this. This is sufficient, I think. (Thank you!)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-28 Thread via GitHub


kazantsev-maksim commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112325535


##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("bitwise_count - min/max values") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
+val table = "bitwise_count_test"
+withTable(table) {
+  sql(s"create table $table(col1 long, col2 int, col3 short, col4 
byte) using parquet")
+  sql(s"insert into $table values(, , 17, 7)")
+  sql(
+s"insert into $table values(${Long.MaxValue}, ${Int.MaxValue}, 
${Short.MaxValue}, ${Byte.MaxValue})")
+  sql(
+s"insert into $table values(${Long.MinValue}, ${Int.MinValue}, 
${Short.MinValue}, ${Byte.MinValue})")
+
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col1) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col2) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col3) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col4) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(true) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(false) FROM 
$table"))
+}
+  }
+}
+  }
+
+  test("bitwise_count - random values (spark gen)") {
+withTempDir { dir =>
+  val path = new Path(dir.toURI.toString, "test.parquet")
+  val filename = path.toString
+  val random = new Random(42)
+  withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
+ParquetGenerator.makeParquetFile(
+  random,
+  spark,
+  filename,
+  10,
+  DataGenOptions(
+allowNull = true,
+generateNegativeZero = true,
+generateArray = false,
+generateStruct = false,
+generateMap = false))
+  }
+  val table = spark.read.parquet(filename)
+  val df =
+table.selectExpr("bit_count(c1)", "bit_count(c2)", "bit_count(c3)", 
"bit_count(c4)")
+
+  checkSparkAnswerAndOperator(df)
+}
+  }
+
+  test("bitwise_count - random values (native parquet gen)") {

Review Comment:
   > `CometTestBase.makeParquetFileAllTypes` has all the integer types you may 
want to test.
   @parthchandra I've already added the test using makeParquetFileAllTypes, or 
does it still need to be improved?



##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("bitwise_count - min/max values") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
+val table = "bitwise_count_test"
+withTable(table) {
+  sql(s"create table $table(col1 long, col2 int, col3 short, col4 
byte) using parquet")
+  sql(s"insert into $table values(, , 17, 7)")
+  sql(
+s"insert into $table values(${Long.MaxValue}, ${Int.MaxValue}, 
${Short.MaxValue}, ${Byte.MaxValue})")
+  sql(
+s"insert into $table values(${Long.MinValue}, ${Int.MinValue}, 
${Short.MinValue}, ${Byte.MinValue})")
+
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col1) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col2) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col3) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(col4) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(true) FROM 
$table"))
+  checkSparkAnswerAndOperator(sql(s"SELECT bit_count(false) FROM 
$table"))
+}
+  }
+}
+  }
+
+  test("bitwise_count - random values (spark gen)") {
+withTempDir { dir =>
+  val path = new Path(dir.toURI.toString, "test.parquet")
+  val filename = path.toString
+  val random = new Random(42)
+  withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
+ParquetGenerator.makeParquetFile(
+  random,
+  spark,
+  filename,
+  10,
+  DataGenOptions(
+allowNull = true,
+generateNegativeZero = true,
+generateArray = false,
+generateStruct = false,
+generateMap = false))
+  }
+  val table = spark.read.parquet(filename)
+  val df =
+table.selectExpr("bit_count(c1)", "bit_count(c2)", "bit_count(c3)", 
"bit_count(c4)")
+
+  checkSparkAnswerAndOperator(df)
+}
+  }
+
+  test("bitwise_count - random values (native parquet gen)") {

Review Comment:
   > `CometTestBase.makeParquetFileAllTy

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-27 Thread via GitHub


parthchandra commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2110587829


##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -90,6 +90,29 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("bitwise_count") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
+val table = "bitwise_count_test"
+withTable(table) {
+  sql(s"create table $table(col1 long, col2 int, col3 short, col4 
byte) using parquet")
+  sql(s"insert into $table values(, , 17, 7)")

Review Comment:
   `CometTestBase.makeParquetFileAllTypes` has all the integer types you may 
want to test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-23 Thread via GitHub


kazantsev-maksim commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2105148849


##
native/spark-expr/src/bitwise_funcs/bitwise_count.rs:
##
@@ -0,0 +1,103 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::{array::*, datatypes::DataType};
+use datafusion::common::Result;
+use datafusion::{error::DataFusionError, logical_expr::ColumnarValue};
+use std::sync::Arc;
+
+macro_rules! compute_op {
+($OPERAND:expr, $DT:ident) => {{
+let operand = $OPERAND
+.as_any()
+.downcast_ref::<$DT>()
+.expect("compute_op failed to downcast array");

Review Comment:
   fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-07 Thread via GitHub


kazuyukitanimura commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2078943165


##
native/spark-expr/src/bitwise_funcs/bitwise_count.rs:
##
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::{
+array::*,
+datatypes::{DataType, Schema},
+record_batch::RecordBatch,
+};
+use datafusion::common::Result;
+use datafusion::physical_expr::PhysicalExpr;
+use datafusion::{error::DataFusionError, logical_expr::ColumnarValue};
+use std::fmt::Formatter;
+use std::hash::Hash;
+use std::{any::Any, sync::Arc};
+
+macro_rules! compute_op {
+($OPERAND:expr, $DT:ident) => {{
+let operand = $OPERAND
+.as_any()
+.downcast_ref::<$DT>()
+.expect("compute_op failed to downcast array");
+
+let result: Int32Array = operand
+.iter()
+.map(|x| x.map(|y| bit_count(y.into(
+.collect();
+
+Ok(Arc::new(result))
+}};
+}
+
+/// BitwiseCount expression
+#[derive(Debug, Eq)]
+pub struct BitwiseCountExpr {
+/// Input expression
+arg: Arc,
+}
+
+impl Hash for BitwiseCountExpr {
+fn hash(&self, state: &mut H) {
+self.arg.hash(state);
+}
+}
+
+impl PartialEq for BitwiseCountExpr {
+fn eq(&self, other: &Self) -> bool {
+self.arg.eq(&other.arg)
+}
+}
+
+impl BitwiseCountExpr {
+/// Create new bitwise count expression
+pub fn new(arg: Arc) -> Self {
+Self { arg }
+}
+
+/// Get the input expression
+pub fn arg(&self) -> &Arc {
+&self.arg
+}
+}
+
+impl std::fmt::Display for BitwiseCountExpr {
+fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+write!(f, "(~ {})", self.arg)
+}
+}
+
+impl PhysicalExpr for BitwiseCountExpr {
+/// Return a reference to Any that can be used for downcasting
+fn as_any(&self) -> &dyn Any {
+self
+}
+
+fn data_type(&self, _: &Schema) -> Result {
+Ok(DataType::Int32)
+}
+
+fn nullable(&self, input_schema: &Schema) -> Result {
+self.arg.nullable(input_schema)
+}
+
+fn evaluate(&self, batch: &RecordBatch) -> Result {
+let arg = self.arg.evaluate(batch)?;
+match arg {
+ColumnarValue::Array(array) => {

Review Comment:
   You will need to have many repeated values in order to create dictionary 
values
   If you use something like 
https://github.com/apache/datafusion-comet/blob/main/spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala#L940
   you should be able to test it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-07 Thread via GitHub


parthchandra commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2078437197


##
native/spark-expr/src/bitwise_funcs/bitwise_count.rs:
##
@@ -0,0 +1,103 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::{array::*, datatypes::DataType};
+use datafusion::common::Result;
+use datafusion::{error::DataFusionError, logical_expr::ColumnarValue};
+use std::sync::Arc;
+
+macro_rules! compute_op {
+($OPERAND:expr, $DT:ident) => {{
+let operand = $OPERAND
+.as_any()
+.downcast_ref::<$DT>()
+.expect("compute_op failed to downcast array");

Review Comment:
   Perhaps report $DT as well in case of failure?



##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -90,6 +90,29 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("bitwise_count") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
+val table = "bitwise_count_test"
+withTable(table) {
+  sql(s"create table $table(col1 long, col2 int, col3 short, col4 
byte) using parquet")
+  sql(s"insert into $table values(, , 17, 7)")

Review Comment:
   Can we also add a test with a Parquet file not created by Spark (see 
`makeParquetFileAllTypes`) and see if this reports correct results with 
unsigned int columns? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-01 Thread via GitHub


kazantsev-maksim commented on PR #1602:
URL: 
https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2844940964

   @mbutrovich I couldn't find a built-in implementation of bit_count in the 
DataFusion project, but i rewrote it using scalarFunc without adding a new 
proto expr.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-30 Thread via GitHub


mbutrovich commented on PR #1602:
URL: 
https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2843603389

   > DataFusion's bit_count has same behavior with Spark 's bit_count function 
Spark
   
   If this is the case, can we delegate to a `ScalarFunc` expression instead of 
creating a new one to maintain? Similar to:
   
https://github.com/apache/datafusion-comet/pull/1700/files#diff-602f1658f4020a9dc8a47f49ac1411735d56d6612aa971751435104e301a9035


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-30 Thread via GitHub


kazuyukitanimura commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2069467729


##
native/spark-expr/src/bitwise_funcs/bitwise_count.rs:
##
@@ -0,0 +1,177 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::{
+array::*,
+datatypes::{DataType, Schema},
+record_batch::RecordBatch,
+};
+use datafusion::common::Result;
+use datafusion::physical_expr::PhysicalExpr;
+use datafusion::{error::DataFusionError, logical_expr::ColumnarValue};
+use std::fmt::Formatter;
+use std::hash::Hash;
+use std::{any::Any, sync::Arc};
+
+macro_rules! compute_op {
+($OPERAND:expr, $DT:ident) => {{
+let operand = $OPERAND
+.as_any()
+.downcast_ref::<$DT>()
+.expect("compute_op failed to downcast array");
+
+let result: Int32Array = operand
+.iter()
+.map(|x| x.map(|y| bit_count(y.into(
+.collect();
+
+Ok(Arc::new(result))
+}};
+}
+
+/// BitwiseCount expression
+#[derive(Debug, Eq)]
+pub struct BitwiseCountExpr {
+/// Input expression
+arg: Arc,
+}
+
+impl Hash for BitwiseCountExpr {
+fn hash(&self, state: &mut H) {
+self.arg.hash(state);
+}
+}
+
+impl PartialEq for BitwiseCountExpr {
+fn eq(&self, other: &Self) -> bool {
+self.arg.eq(&other.arg)
+}
+}
+
+impl BitwiseCountExpr {
+/// Create new bitwise count expression
+pub fn new(arg: Arc) -> Self {
+Self { arg }
+}
+
+/// Get the input expression
+pub fn arg(&self) -> &Arc {
+&self.arg
+}
+}
+
+impl std::fmt::Display for BitwiseCountExpr {
+fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+write!(f, "(~ {})", self.arg)
+}
+}
+
+impl PhysicalExpr for BitwiseCountExpr {
+/// Return a reference to Any that can be used for downcasting
+fn as_any(&self) -> &dyn Any {
+self
+}
+
+fn data_type(&self, _: &Schema) -> Result {
+Ok(DataType::Int32)
+}
+
+fn nullable(&self, input_schema: &Schema) -> Result {
+self.arg.nullable(input_schema)
+}
+
+fn evaluate(&self, batch: &RecordBatch) -> Result {
+let arg = self.arg.evaluate(batch)?;
+match arg {
+ColumnarValue::Array(array) => {

Review Comment:
   Any possibility that we get a dictionary?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-28 Thread via GitHub


kazantsev-maksim commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2064138880


##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -90,6 +90,29 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("bitwise_count") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
+val table = "bitwise_count_test"
+withTable(table) {
+  sql(s"create table $table(col1 long, col2 int, col3 short, col4 
byte) using parquet")
+  sql(s"insert into $table values(, , 17, 7)")

Review Comment:
   Yep. Added tests with random data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-04 Thread via GitHub


kazuyukitanimura commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2025487131


##
spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala:
##
@@ -90,6 +90,29 @@ class CometExpressionSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 }
   }
 
+  test("bitwise_count") {
+Seq(false, true).foreach { dictionary =>
+  withSQLConf("parquet.enable.dictionary" -> dictionary.toString) {
+val table = "bitwise_count_test"
+withTable(table) {
+  sql(s"create table $table(col1 long, col2 int, col3 short, col4 
byte) using parquet")
+  sql(s"insert into $table values(, , 17, 7)")

Review Comment:
   Do you mind adding random number cases?



##
native/spark-expr/src/bitwise_funcs/bitwise_count.rs:
##
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::{
+array::*,
+datatypes::{DataType, Schema},
+record_batch::RecordBatch,
+};
+use datafusion::common::Result;
+use datafusion::physical_expr::PhysicalExpr;
+use datafusion::{error::DataFusionError, logical_expr::ColumnarValue};
+use std::hash::Hash;
+use std::{any::Any, sync::Arc};
+
+macro_rules! compute_op {
+($OPERAND:expr, $DT:ident, $TY:ty) => {{
+let operand = $OPERAND
+.as_any()
+.downcast_ref::<$DT>()
+.expect("compute_op failed to downcast array");
+
+let result: $DT = operand
+.iter()
+.map(|x| x.map(|y| bit_count(y.into()) as $TY))
+.collect();
+
+Ok(Arc::new(result))
+}};
+}
+
+/// BitwiseCount expression
+#[derive(Debug, Eq)]
+pub struct BitwiseCountExpr {
+/// Input expression
+arg: Arc,
+}
+
+impl Hash for BitwiseCountExpr {
+fn hash(&self, state: &mut H) {
+self.arg.hash(state);
+}
+}
+
+impl PartialEq for BitwiseCountExpr {
+fn eq(&self, other: &Self) -> bool {
+self.arg.eq(&other.arg)
+}
+}
+
+impl BitwiseCountExpr {
+/// Create new bitwise count expression
+pub fn new(arg: Arc) -> Self {
+Self { arg }
+}
+
+/// Get the input expression
+pub fn arg(&self) -> &Arc {
+&self.arg
+}
+}
+
+impl std::fmt::Display for BitwiseCountExpr {
+fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+write!(f, "(~ {})", self.arg)
+}
+}
+
+impl PhysicalExpr for BitwiseCountExpr {
+/// Return a reference to Any that can be used for downcasting
+fn as_any(&self) -> &dyn Any {
+self
+}
+
+fn data_type(&self, input_schema: &Schema) -> Result {
+self.arg.data_type(input_schema)
+}
+
+fn nullable(&self, input_schema: &Schema) -> Result {
+self.arg.nullable(input_schema)
+}
+
+fn evaluate(&self, batch: &RecordBatch) -> Result {
+let arg = self.arg.evaluate(batch)?;
+match arg {
+ColumnarValue::Array(array) => {
+let result: Result = match array.data_type() {
+DataType::Int8 | DataType::Boolean => compute_op!(array, 
Int8Array, i8),
+DataType::Int16 => compute_op!(array, Int16Array, i16),
+DataType::Int32 => compute_op!(array, Int32Array, i32),
+DataType::Int64 => compute_op!(array, Int64Array, i64),
+_ => Err(DataFusionError::Execution(format!(
+"(- '{:?}') can't be evaluated because the 
expression's type is {:?}, not signed int",
+self,
+array.data_type(),
+))),
+};
+result.map(ColumnarValue::Array)
+}
+ColumnarValue::Scalar(_) => Err(DataFusionError::Internal(
+"shouldn't go to bitwise count scalar path".to_string(),
+)),
+}
+}
+
+fn children(&self) -> Vec<&Arc> {
+vec![&self.arg]
+}
+
+fn with_new_children(
+self: Arc,
+children: Vec>,
+) -> Result> {
+Ok(Arc::new(BitwiseCountExpr::new(Arc::clone(&children[0]
+}
+}
+
+pub fn bitwise_count(arg: Arc) -> Result> {
+Ok(Arc::new(BitwiseC

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-02 Thread via GitHub


kazuyukitanimura commented on code in PR #1602:
URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2025565665


##
native/spark-expr/src/bitwise_funcs/bitwise_count.rs:
##
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use arrow::{
+array::*,
+datatypes::{DataType, Schema},
+record_batch::RecordBatch,
+};
+use datafusion::common::Result;
+use datafusion::physical_expr::PhysicalExpr;
+use datafusion::{error::DataFusionError, logical_expr::ColumnarValue};
+use std::hash::Hash;
+use std::{any::Any, sync::Arc};
+
+macro_rules! compute_op {
+($OPERAND:expr, $DT:ident, $TY:ty) => {{
+let operand = $OPERAND
+.as_any()
+.downcast_ref::<$DT>()
+.expect("compute_op failed to downcast array");
+
+let result: $DT = operand
+.iter()
+.map(|x| x.map(|y| bit_count(y.into()) as $TY))
+.collect();
+
+Ok(Arc::new(result))
+}};
+}
+
+/// BitwiseCount expression
+#[derive(Debug, Eq)]
+pub struct BitwiseCountExpr {
+/// Input expression
+arg: Arc,
+}
+
+impl Hash for BitwiseCountExpr {
+fn hash(&self, state: &mut H) {
+self.arg.hash(state);
+}
+}
+
+impl PartialEq for BitwiseCountExpr {
+fn eq(&self, other: &Self) -> bool {
+self.arg.eq(&other.arg)
+}
+}
+
+impl BitwiseCountExpr {
+/// Create new bitwise count expression
+pub fn new(arg: Arc) -> Self {
+Self { arg }
+}
+
+/// Get the input expression
+pub fn arg(&self) -> &Arc {
+&self.arg
+}
+}
+
+impl std::fmt::Display for BitwiseCountExpr {
+fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+write!(f, "(~ {})", self.arg)
+}
+}
+
+impl PhysicalExpr for BitwiseCountExpr {
+/// Return a reference to Any that can be used for downcasting
+fn as_any(&self) -> &dyn Any {
+self
+}
+
+fn data_type(&self, input_schema: &Schema) -> Result {
+self.arg.data_type(input_schema)
+}
+
+fn nullable(&self, input_schema: &Schema) -> Result {
+self.arg.nullable(input_schema)
+}
+
+fn evaluate(&self, batch: &RecordBatch) -> Result {
+let arg = self.arg.evaluate(batch)?;
+match arg {
+ColumnarValue::Array(array) => {
+let result: Result = match array.data_type() {
+DataType::Int8 | DataType::Boolean => compute_op!(array, 
Int8Array, i8),
+DataType::Int16 => compute_op!(array, Int16Array, i16),
+DataType::Int32 => compute_op!(array, Int32Array, i32),
+DataType::Int64 => compute_op!(array, Int64Array, i64),
+_ => Err(DataFusionError::Execution(format!(
+"(- '{:?}') can't be evaluated because the 
expression's type is {:?}, not signed int",
+self,
+array.data_type(),
+))),
+};
+result.map(ColumnarValue::Array)
+}
+ColumnarValue::Scalar(_) => Err(DataFusionError::Internal(
+"shouldn't go to bitwise count scalar path".to_string(),
+)),
+}
+}
+
+fn children(&self) -> Vec<&Arc> {
+vec![&self.arg]
+}
+
+fn with_new_children(
+self: Arc,
+children: Vec>,
+) -> Result> {
+Ok(Arc::new(BitwiseCountExpr::new(Arc::clone(&children[0]
+}
+}
+
+pub fn bitwise_count(arg: Arc) -> Result> {
+Ok(Arc::new(BitwiseCountExpr::new(arg)))
+}
+
+// Here’s the equivalent Rust implementation of the bitCount function (similar 
to Apache Spark's bitCount for LongType)
+fn bit_count(i: i64) -> i32 {
+let mut u = i as u64;
+u = u - ((u >> 1) & 0x);

Review Comment:
   Woops please never mind, overflow wouldn't happen



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected].

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-02 Thread via GitHub


codecov-commenter commented on PR #1602:
URL: 
https://github.com/apache/datafusion-comet/pull/1602#issuecomment-2773591428

   ## 
[Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1602?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   All modified and coverable lines are covered by tests :white_check_mark:
   > Project coverage is 57.13%. Comparing base 
[(`f09f8af`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/f09f8af64c6599255e116a376f4f008f2fd63b43?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 to head 
[(`2e8acf8`)](https://app.codecov.io/gh/apache/datafusion-comet/commit/2e8acf81bdc04dabe12fdbdbaada14477e41e05e?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   > Report is 120 commits behind head on main.
   
   Additional details and impacted files
   
   
   ```diff
   @@ Coverage Diff  @@
   ##   main#1602  +/-   ##
   
   + Coverage 56.12%   57.13%   +1.00% 
   + Complexity  976  966  -10 
   
 Files   119  124   +5 
 Lines 1174312392 +649 
 Branches   2251 2340  +89 
   
   + Hits   6591 7080 +489 
   - Misses 4012 4117 +105 
   - Partials   1140 1195  +55 
   ```
   
   
   
   [:umbrella: View full report in Codecov by 
Sentry](https://app.codecov.io/gh/apache/datafusion-comet/pull/1602?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
   :loudspeaker: Have feedback on the report? [Share it 
here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   
:rocket: New features to boost your workflow: 
   
   - :snowflake: [Test 
Analytics](https://docs.codecov.com/docs/test-analytics): Detect flaky tests, 
report on failures, and find test suite problems.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]