Re: [PR] Handle union schema name coercion [datafusion]

2025-05-20 Thread via GitHub


gabotechs commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098183006


##
datafusion/core/src/physical_planner.rs:
##
@@ -2711,6 +2724,47 @@ mod tests {
 
 assert_eq!(col.name(), "metric:avg");
 }
+
+#[tokio::test]
+async fn test_maybe_fix_nested_column_name_with_colon() {
+let schema = Schema::new(vec![Field::new("column", DataType::Int32, 
false)]);
+let schema_ref: SchemaRef = Arc::new(schema);
+
+// Construct the nested expr
+let col_expr = Arc::new(Column::new("column:1", 0)) as Arc;
+let is_not_null_expr = Arc::new(IsNotNullExpr::new(col_expr.clone()));
+
+// Create a binary expression and put the column inside
+let binary_expr = Arc::new(BinaryExpr::new(
+is_not_null_expr.clone(),
+Operator::Or,
+is_not_null_expr.clone(),
+)) as Arc;
+
+let fixed_expr =

Review Comment:
   Also, confirmed that this test fails on `main` but succeeds in this branch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Handle union schema name coercion [datafusion]

2025-05-20 Thread via GitHub


gabotechs commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098169086


##
datafusion/core/src/physical_planner.rs:
##
@@ -2711,6 +2724,47 @@ mod tests {
 
 assert_eq!(col.name(), "metric:avg");
 }
+
+#[tokio::test]
+async fn test_maybe_fix_nested_column_name_with_colon() {
+let schema = Schema::new(vec![Field::new("column", DataType::Int32, 
false)]);
+let schema_ref: SchemaRef = Arc::new(schema);
+
+// Construct the nested expr
+let col_expr = Arc::new(Column::new("column:1", 0)) as Arc;
+let is_not_null_expr = Arc::new(IsNotNullExpr::new(col_expr.clone()));
+
+// Create a binary expression and put the column inside
+let binary_expr = Arc::new(BinaryExpr::new(
+is_not_null_expr.clone(),
+Operator::Or,
+is_not_null_expr.clone(),
+)) as Arc;
+
+let fixed_expr =

Review Comment:
   nit: just to reinforce a bit the purpose of the test
   ```suggestion
   // Fix the column names ensuring that columns nested under 
expressions are also fixed
   let fixed_expr =
   ```



##
datafusion/core/src/physical_planner.rs:
##
@@ -2067,29 +2069,37 @@ fn maybe_fix_physical_column_name(
 expr: Result>,
 input_physical_schema: &SchemaRef,
 ) -> Result> {
-if let Ok(e) = &expr {

Review Comment:
   nit: with just some minor tweaks we should be able to trim 2 indentation 
levels in the body of the function while saving some lines:
   Less indented version of the function
   
   ```rust
   fn maybe_fix_physical_column_name(
   expr: Result>,
   input_physical_schema: &SchemaRef,
   ) -> Result> {
   let Ok(expr) = expr else { return expr };
   expr.transform_down(|node| {
   let Some(column) = node.as_any().downcast_ref::() else {
   return Ok(Transformed::no(node));
   };
   let idx = column.index();
   let physical_field = input_physical_schema.field(idx);
   let expr_col_name = column.name();
   let physical_name = physical_field.name();
   
   if expr_col_name != physical_name {
   // handle edge cases where the physical_name contains ':'.
   let colon_count = physical_name.matches(':').count();
   let mut splits = expr_col_name.match_indices(':');
   let split_pos = splits.nth(colon_count);
   
   if let Some((i, _)) = split_pos {
   let base_name = &expr_col_name[..i];
   if base_name == physical_name {
   let updated_column = Column::new(physical_name, idx);
   return Ok(Transformed::yes(Arc::new(updated_column)));
   }
   }
   }
   
   // If names already match or fix is not possible, just leave it as 
it is
   Ok(Transformed::no(node))
   })
   .data()
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Handle union schema name coercion [datafusion]

2025-05-19 Thread via GitHub


LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227


##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
 expr: Result>,
 input_physical_schema: &SchemaRef,
 ) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name, 
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {

Review Comment:
   Sometimes `Columns` can be inside other type of expressions (so they are not 
on the "top level") , for example:
   ```
   BinaryExpr {
   left: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column",
   },
   ),
   ),
   op: Or,
   right: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column:1",
   },
   ),
   ),
   },
   ```
   
if so [the current 
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
 won't apply, this change handles those cases by using `transform_down` but 
logic remains the same



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Handle union schema name coercion [datafusion]

2025-05-19 Thread via GitHub


LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227


##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
 expr: Result>,
 input_physical_schema: &SchemaRef,
 ) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name, 
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {

Review Comment:
   Sometimes `Columns` can be inside other type of expressions (so they are not 
on the "top level") , for example:
   ```
   BinaryExpr {
   left: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column",
   },
   ),
   ),
   op: Or,
   right: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column:1",
   },
   ),
   ),
   },
   ```
   
if so [the current 
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
 won't apply, this change handles those cases by using `transform_down`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Handle union schema name coercion [datafusion]

2025-05-19 Thread via GitHub


LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227


##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
 expr: Result>,
 input_physical_schema: &SchemaRef,
 ) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name, 
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {

Review Comment:
   Sometimes `Columns` are inside other type of expression (they are not "top 
level") , for example:
   ```
   BinaryExpr {
   left: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column",
   },
   ),
   ),
   op: Or,
   right: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column:1",
   },
   ),
   ),
   },
   ```
   
if so [the current 
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
 won't apply, this change handles those cases by using `transform_down`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Handle union schema name coercion [datafusion]

2025-05-19 Thread via GitHub


LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227


##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
 expr: Result>,
 input_physical_schema: &SchemaRef,
 ) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name, 
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {

Review Comment:
   Sometimes `Columns` are inside other type of expression (they are not "top 
level") , for example:
   ```
   BinaryExpr {
   left: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column",
   },
   ),
   ),
   op: Or,
   right: IsNotNull(
   Column(
   Column {
   relation: Some(
   Bare {
   table: "left",
   },
   ),
   name: "people_column:1",
   },
   ),
   ),
   },
   ```
   
if so [the current 
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
 won't apply, this change handles those cases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org



Re: [PR] Handle union schema name coercion [datafusion]

2025-05-19 Thread via GitHub


LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095138636


##
datafusion/physical-plan/src/union.rs:
##
@@ -532,9 +535,10 @@ fn union_schema(inputs: &[Arc]) -> 
SchemaRef {
 field.with_metadata(metadata)
 })
 .find_or_first(Field::is_nullable)
-// We can unwrap this because if inputs was empty, this 
would've already panic'ed when we
-// indexed into inputs[0].

Review Comment:
   yep



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org