Re: [PR] Handle union schema name coercion [datafusion]
gabotechs commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098183006 ## datafusion/core/src/physical_planner.rs: ## @@ -2711,6 +2724,47 @@ mod tests { assert_eq!(col.name(), "metric:avg"); } + +#[tokio::test] +async fn test_maybe_fix_nested_column_name_with_colon() { +let schema = Schema::new(vec![Field::new("column", DataType::Int32, false)]); +let schema_ref: SchemaRef = Arc::new(schema); + +// Construct the nested expr +let col_expr = Arc::new(Column::new("column:1", 0)) as Arc; +let is_not_null_expr = Arc::new(IsNotNullExpr::new(col_expr.clone())); + +// Create a binary expression and put the column inside +let binary_expr = Arc::new(BinaryExpr::new( +is_not_null_expr.clone(), +Operator::Or, +is_not_null_expr.clone(), +)) as Arc; + +let fixed_expr = Review Comment: Also, confirmed that this test fails on `main` but succeeds in this branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Handle union schema name coercion [datafusion]
gabotechs commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2098169086 ## datafusion/core/src/physical_planner.rs: ## @@ -2711,6 +2724,47 @@ mod tests { assert_eq!(col.name(), "metric:avg"); } + +#[tokio::test] +async fn test_maybe_fix_nested_column_name_with_colon() { +let schema = Schema::new(vec![Field::new("column", DataType::Int32, false)]); +let schema_ref: SchemaRef = Arc::new(schema); + +// Construct the nested expr +let col_expr = Arc::new(Column::new("column:1", 0)) as Arc; +let is_not_null_expr = Arc::new(IsNotNullExpr::new(col_expr.clone())); + +// Create a binary expression and put the column inside +let binary_expr = Arc::new(BinaryExpr::new( +is_not_null_expr.clone(), +Operator::Or, +is_not_null_expr.clone(), +)) as Arc; + +let fixed_expr = Review Comment: nit: just to reinforce a bit the purpose of the test ```suggestion // Fix the column names ensuring that columns nested under expressions are also fixed let fixed_expr = ``` ## datafusion/core/src/physical_planner.rs: ## @@ -2067,29 +2069,37 @@ fn maybe_fix_physical_column_name( expr: Result>, input_physical_schema: &SchemaRef, ) -> Result> { -if let Ok(e) = &expr { Review Comment: nit: with just some minor tweaks we should be able to trim 2 indentation levels in the body of the function while saving some lines: Less indented version of the function ```rust fn maybe_fix_physical_column_name( expr: Result>, input_physical_schema: &SchemaRef, ) -> Result> { let Ok(expr) = expr else { return expr }; expr.transform_down(|node| { let Some(column) = node.as_any().downcast_ref::() else { return Ok(Transformed::no(node)); }; let idx = column.index(); let physical_field = input_physical_schema.field(idx); let expr_col_name = column.name(); let physical_name = physical_field.name(); if expr_col_name != physical_name { // handle edge cases where the physical_name contains ':'. let colon_count = physical_name.matches(':').count(); let mut splits = expr_col_name.match_indices(':'); let split_pos = splits.nth(colon_count); if let Some((i, _)) = split_pos { let base_name = &expr_col_name[..i]; if base_name == physical_name { let updated_column = Column::new(physical_name, idx); return Ok(Transformed::yes(Arc::new(updated_column))); } } } // If names already match or fix is not possible, just leave it as it is Ok(Transformed::no(node)) }) .data() } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227 ## datafusion/core/src/physical_planner.rs: ## @@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name( expr: Result>, input_physical_schema: &SchemaRef, ) -> Result> { -if let Ok(e) = &expr { -if let Some(column) = e.as_any().downcast_ref::() { -let physical_field = input_physical_schema.field(column.index()); -let expr_col_name = column.name(); -let physical_name = physical_field.name(); - -if physical_name != expr_col_name { -// handle edge cases where the physical_name contains ':'. -let colon_count = physical_name.matches(':').count(); -let mut splits = expr_col_name.match_indices(':'); -let split_pos = splits.nth(colon_count); - -if let Some((idx, _)) = split_pos { -let base_name = &expr_col_name[..idx]; -if base_name == physical_name { -let updated_column = Column::new(physical_name, column.index()); -return Ok(Arc::new(updated_column)); +expr.and_then(|e| { +e.transform_down(|node| { Review Comment: Sometimes `Columns` can be inside other type of expressions (so they are not on the "top level") , for example: ``` BinaryExpr { left: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column", }, ), ), op: Or, right: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column:1", }, ), ), }, ``` if so [the current fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068) won't apply, this change handles those cases by using `transform_down` but logic remains the same -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227 ## datafusion/core/src/physical_planner.rs: ## @@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name( expr: Result>, input_physical_schema: &SchemaRef, ) -> Result> { -if let Ok(e) = &expr { -if let Some(column) = e.as_any().downcast_ref::() { -let physical_field = input_physical_schema.field(column.index()); -let expr_col_name = column.name(); -let physical_name = physical_field.name(); - -if physical_name != expr_col_name { -// handle edge cases where the physical_name contains ':'. -let colon_count = physical_name.matches(':').count(); -let mut splits = expr_col_name.match_indices(':'); -let split_pos = splits.nth(colon_count); - -if let Some((idx, _)) = split_pos { -let base_name = &expr_col_name[..idx]; -if base_name == physical_name { -let updated_column = Column::new(physical_name, column.index()); -return Ok(Arc::new(updated_column)); +expr.and_then(|e| { +e.transform_down(|node| { Review Comment: Sometimes `Columns` can be inside other type of expressions (so they are not on the "top level") , for example: ``` BinaryExpr { left: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column", }, ), ), op: Or, right: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column:1", }, ), ), }, ``` if so [the current fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068) won't apply, this change handles those cases by using `transform_down` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227 ## datafusion/core/src/physical_planner.rs: ## @@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name( expr: Result>, input_physical_schema: &SchemaRef, ) -> Result> { -if let Ok(e) = &expr { -if let Some(column) = e.as_any().downcast_ref::() { -let physical_field = input_physical_schema.field(column.index()); -let expr_col_name = column.name(); -let physical_name = physical_field.name(); - -if physical_name != expr_col_name { -// handle edge cases where the physical_name contains ':'. -let colon_count = physical_name.matches(':').count(); -let mut splits = expr_col_name.match_indices(':'); -let split_pos = splits.nth(colon_count); - -if let Some((idx, _)) = split_pos { -let base_name = &expr_col_name[..idx]; -if base_name == physical_name { -let updated_column = Column::new(physical_name, column.index()); -return Ok(Arc::new(updated_column)); +expr.and_then(|e| { +e.transform_down(|node| { Review Comment: Sometimes `Columns` are inside other type of expression (they are not "top level") , for example: ``` BinaryExpr { left: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column", }, ), ), op: Or, right: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column:1", }, ), ), }, ``` if so [the current fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068) won't apply, this change handles those cases by using `transform_down` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095705227 ## datafusion/core/src/physical_planner.rs: ## @@ -2069,29 +2071,37 @@ fn maybe_fix_physical_column_name( expr: Result>, input_physical_schema: &SchemaRef, ) -> Result> { -if let Ok(e) = &expr { -if let Some(column) = e.as_any().downcast_ref::() { -let physical_field = input_physical_schema.field(column.index()); -let expr_col_name = column.name(); -let physical_name = physical_field.name(); - -if physical_name != expr_col_name { -// handle edge cases where the physical_name contains ':'. -let colon_count = physical_name.matches(':').count(); -let mut splits = expr_col_name.match_indices(':'); -let split_pos = splits.nth(colon_count); - -if let Some((idx, _)) = split_pos { -let base_name = &expr_col_name[..idx]; -if base_name == physical_name { -let updated_column = Column::new(physical_name, column.index()); -return Ok(Arc::new(updated_column)); +expr.and_then(|e| { +e.transform_down(|node| { Review Comment: Sometimes `Columns` are inside other type of expression (they are not "top level") , for example: ``` BinaryExpr { left: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column", }, ), ), op: Or, right: IsNotNull( Column( Column { relation: Some( Bare { table: "left", }, ), name: "people_column:1", }, ), ), }, ``` if so [the current fix](https://github.com/apache/datafusion/blob/3e30f77f08aa9184029da80c7f7e2ec00999fa44/datafusion/core/src/physical_planner.rs#L2068) won't apply, this change handles those cases -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Handle union schema name coercion [datafusion]
LiaCastaneda commented on code in PR #16064: URL: https://github.com/apache/datafusion/pull/16064#discussion_r2095138636 ## datafusion/physical-plan/src/union.rs: ## @@ -532,9 +535,10 @@ fn union_schema(inputs: &[Arc]) -> SchemaRef { field.with_metadata(metadata) }) .find_or_first(Field::is_nullable) -// We can unwrap this because if inputs was empty, this would've already panic'ed when we -// indexed into inputs[0]. Review Comment: yep -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org