Re: [PR] Handle union schema name coercion [datafusion]
gabotechs commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098183006
##
datafusion/core/src/physical_planner.rs:
##
@@ -2711,6 +2724,47 @@ mod tests {
assert_eq!(col.name(), "metric:avg");
}
+
+#[tokio::test]
+async fn test_maybe_fix_nested_column_name_with_colon() {
+let schema = Schema::new(vec![Field::new("column", DataType::Int32,
false)]);
+let schema_ref: SchemaRef = Arc::new(schema);
+
+// Construct the nested expr
+let col_expr = Arc::new(Column::new("column:1", 0)) as Arc;
+let is_not_null_expr = Arc::new(IsNotNullExpr::new(col_expr.clone()));
+
+// Create a binary expression and put the column inside
+let binary_expr = Arc::new(BinaryExpr::new(
+is_not_null_expr.clone(),
+Operator::Or,
+is_not_null_expr.clone(),
+)) as Arc;
+
+let fixed_expr =
Review Comment:
Also, confirmed that this test fails on `main` but succeeds in this branch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Handle union schema name coercion [datafusion]
gabotechs commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098169086
##
datafusion/core/src/physical_planner.rs:
##
@@ -2711,6 +2724,47 @@ mod tests {
assert_eq!(col.name(), "metric:avg");
}
+
+#[tokio::test]
+async fn test_maybe_fix_nested_column_name_with_colon() {
+let schema = Schema::new(vec![Field::new("column", DataType::Int32,
false)]);
+let schema_ref: SchemaRef = Arc::new(schema);
+
+// Construct the nested expr
+let col_expr = Arc::new(Column::new("column:1", 0)) as Arc;
+let is_not_null_expr = Arc::new(IsNotNullExpr::new(col_expr.clone()));
+
+// Create a binary expression and put the column inside
+let binary_expr = Arc::new(BinaryExpr::new(
+is_not_null_expr.clone(),
+Operator::Or,
+is_not_null_expr.clone(),
+)) as Arc;
+
+let fixed_expr =
Review Comment:
nit: just to reinforce a bit the purpose of the test
```suggestion
// Fix the column names ensuring that columns nested under
expressions are also fixed
let fixed_expr =
```
##
datafusion/core/src/physical_planner.rs:
##
@@ -2067,29 +2069,37 @@ fn maybe_fix_physical_column_name(
expr: Result>,
input_physical_schema: &SchemaRef,
) -> Result> {
-if let Ok(e) = &expr {
Review Comment:
nit: with just some minor tweaks we should be able to trim 2 indentation
levels in the body of the function while saving some lines:
Less indented version of the function
```rust
fn maybe_fix_physical_column_name(
expr: Result>,
input_physical_schema: &SchemaRef,
) -> Result> {
let Ok(expr) = expr else { return expr };
expr.transform_down(|node| {
let Some(column) = node.as_any().downcast_ref::() else {
return Ok(Transformed::no(node));
};
let idx = column.index();
let physical_field = input_physical_schema.field(idx);
let expr_col_name = column.name();
let physical_name = physical_field.name();
if expr_col_name != physical_name {
// handle edge cases where the physical_name contains ':'.
let colon_count = physical_name.matches(':').count();
let mut splits = expr_col_name.match_indices(':');
let split_pos = splits.nth(colon_count);
if let Some((i, _)) = split_pos {
let base_name = &expr_col_name[..i];
if base_name == physical_name {
let updated_column = Column::new(physical_name, idx);
return Ok(Transformed::yes(Arc::new(updated_column)));
}
}
}
// If names already match or fix is not possible, just leave it as
it is
Ok(Transformed::no(node))
})
.data()
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227
##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
expr: Result>,
input_physical_schema: &SchemaRef,
) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name,
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {
Review Comment:
Sometimes `Columns` can be inside other type of expressions (so they are not
on the "top level") , for example:
```
BinaryExpr {
left: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column",
},
),
),
op: Or,
right: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column:1",
},
),
),
},
```
if so [the current
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
won't apply, this change handles those cases by using `transform_down` but
logic remains the same
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227
##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
expr: Result>,
input_physical_schema: &SchemaRef,
) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name,
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {
Review Comment:
Sometimes `Columns` can be inside other type of expressions (so they are not
on the "top level") , for example:
```
BinaryExpr {
left: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column",
},
),
),
op: Or,
right: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column:1",
},
),
),
},
```
if so [the current
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
won't apply, this change handles those cases by using `transform_down`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227
##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
expr: Result>,
input_physical_schema: &SchemaRef,
) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name,
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {
Review Comment:
Sometimes `Columns` are inside other type of expression (they are not "top
level") , for example:
```
BinaryExpr {
left: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column",
},
),
),
op: Or,
right: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column:1",
},
),
),
},
```
if so [the current
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
won't apply, this change handles those cases by using `transform_down`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227
##
datafusion/core/src/physical_planner.rs:
##
@@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name(
expr: Result>,
input_physical_schema: &SchemaRef,
) -> Result> {
-if let Ok(e) = &expr {
-if let Some(column) = e.as_any().downcast_ref::() {
-let physical_field = input_physical_schema.field(column.index());
-let expr_col_name = column.name();
-let physical_name = physical_field.name();
-
-if physical_name != expr_col_name {
-// handle edge cases where the physical_name contains ':'.
-let colon_count = physical_name.matches(':').count();
-let mut splits = expr_col_name.match_indices(':');
-let split_pos = splits.nth(colon_count);
-
-if let Some((idx, _)) = split_pos {
-let base_name = &expr_col_name[..idx];
-if base_name == physical_name {
-let updated_column = Column::new(physical_name,
column.index());
-return Ok(Arc::new(updated_column));
+expr.and_then(|e| {
+e.transform_down(|node| {
Review Comment:
Sometimes `Columns` are inside other type of expression (they are not "top
level") , for example:
```
BinaryExpr {
left: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column",
},
),
),
op: Or,
right: IsNotNull(
Column(
Column {
relation: Some(
Bare {
table: "left",
},
),
name: "people_column:1",
},
),
),
},
```
if so [the current
fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068)
won't apply, this change handles those cases
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064:
URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095138636
##
datafusion/physical-plan/src/union.rs:
##
@@ -532,9 +535,10 @@ fn union_schema(inputs: &[Arc]) ->
SchemaRef {
field.with_metadata(metadata)
})
.find_or_first(Field::is_nullable)
-// We can unwrap this because if inputs was empty, this
would've already panic'ed when we
-// indexed into inputs[0].
Review Comment:
yep
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
