Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-17 Thread via GitHub


alamb commented on PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2980678961

   🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-17 Thread via GitHub


blaginin merged PR #16324:
URL: https://github.com/apache/datafusion/pull/16324


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-15 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2148615171


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -797,14 +796,16 @@ async fn explain_physical_plan_only() {
 let sql = "EXPLAIN select count(*) from (values ('a', 1, 100), ('a', 2, 
150)) as t (c1,c2,c3)";
 let actual = execute(&ctx, sql).await;
 let actual = normalize_vec_for_explain(actual);

Review Comment:
   After searching  `ExplainNormalizer` in the codebase, I found that only four 
test cases used it:
   
   1.  `test_physical_plan_display_indent` (initialize `ExplainNormalizer`)
   2.  `test_physical_plan_display_indent_multi_children` (initialize 
`ExplainNormalizer`)
   3.  `explain_logical_plan_only` (call `normalize_vec_for_explain`)
   4.  `explain_physical_plan_only` (call `normalize_vec_for_explain`)
   
   1.2. created the  physical plan with fixed core numbers (9), and 3. 4. 
doesn't have `partitioning=RoundRobinBatch()` string in the snapshot, so I 
think this change may not be necessary?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-14 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2146791719


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");

Review Comment:
   Sure, I have revert the change in this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-13 Thread via GitHub


alamb commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2145201108


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");

Review Comment:
   maybe we can just keep using `assert_metrics` for this style test (or 
perhaps revert the changes from this PR and work on migrating the metrics tests 
to a nicer style in a separate PR so we don't delay merging the other parts of 
this PR)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143240105


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");

Review Comment:
   I have tried to assert the raw table, but I found that it caused 
inconsistent trailing whitespace in each snapshot testing. That is why I 
trimmed the table to only keep the plan context.
   Do you have any idea for keeping the consistent table format in each 
execution? thx



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");
+insta::with_settings!({filters => vec![
+(r"\d+\.?\d*[µmn]?s", "[TIME]"),

Review Comment:
   > you can probably use lookahead 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion
   
   Unfortunately, Rust's regex crate does not support lookahead or lookbehind 
assertions 😢 ( [ref](https://github.com/rust-lang/regex/issues/127) )
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");
+insta::with_settings!({filters => vec![
+(r"\d+\.?\d*[µmn]?s", "[TIME]"),

Review Comment:
   > you can probably use lookahead 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion
   
   Unfortunately, Rust's regex crate does not support lookahead or lookbehind 
assertions 😢 ([ref](https://github.com/rust-lang/regex/issues/127))
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");
+insta::with_settings!({filters => vec![
+(r"\d+\.?\d*[µmn]?s", "[TIME]"),

Review Comment:
   > you can probably use lookahead 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion
   
   Unfortunately, Rust's regex crate explicitly does not support lookahead or 
lookbehind assertions 😢 ([ref](https://github.com/rust-lang/regex/issues/127))
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-11 Thread via GitHub


blaginin commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2141000796


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");
+insta::with_settings!({filters => vec![
+(r"\d+\.?\d*[µmn]?s", "[TIME]"),

Review Comment:
   Happy with keeping as is, but on this
   
   > might match irrelevant strings
   
   you can probably use lookahead 
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Lookahead_assertion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-11 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2140683717


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");
+insta::with_settings!({filters => vec![
+(r"\d+\.?\d*[µmn]?s", "[TIME]"),

Review Comment:
   Agree, current pattern can't match the longer time (ex. min, h, d). However, 
a broader pattern might match irrelevant strings. For example, match on `[ ]`:  
metrics=[output_rows=1, elapsed_compute=110.947µs] -> metrics=[TIME] 
   
   I'm currently considering to add more time units in the regex pattern:
   ```regex
   r"\d+\.?\d*(?:µs|ms|ns|s|min|h)\b"
   ```
   
   Or, do you have any suggestions for a more precise way to match time-related 
strings? many thanks.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-11 Thread via GitHub


alamb commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2140697002


##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");
+insta::with_settings!({filters => vec![
+(r"\d+\.?\d*[µmn]?s", "[TIME]"),

Review Comment:
   Maybe just for this teset case we could avoid using `insta` and use the 
previous approach of substring matches
   
   While that approach is also non ideal, it has seemed to work this far



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-10 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2138296977


##
datafusion/core/tests/optimizer/mod.rs:
##
@@ -107,16 +126,15 @@ fn concat_ws_literals() -> Result<()> {
 let sql = "SELECT concat_ws('-', true, col_int32, false, null, 'hello', 
col_utf8, 12, '', 3.4) \
 AS col
 FROM test";
-let expected =
-"Projection: concat_ws(Utf8(\"-\"), Utf8(\"true\"), 
CAST(test.col_int32 AS Utf8), Utf8(\"false-hello\"), test.col_utf8, 
Utf8(\"12--3.4\")) AS col\
-\n  TableScan: test projection=[col_int32, col_utf8]";
-quick_test(sql, expected);
-Ok(())
-}
-
-fn quick_test(sql: &str, expected_plan: &str) {
 let plan = test_sql(sql).unwrap();
-assert_eq!(expected_plan, format!("{plan}"));
+assert_snapshot!(
+format!("{plan}"),

Review Comment:
   Thanks for the comment, the `format!` is exactly unnecessary in these cases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-10 Thread via GitHub


Chen-Yuan-Lai commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2138296977


##
datafusion/core/tests/optimizer/mod.rs:
##
@@ -107,16 +126,15 @@ fn concat_ws_literals() -> Result<()> {
 let sql = "SELECT concat_ws('-', true, col_int32, false, null, 'hello', 
col_utf8, 12, '', 3.4) \
 AS col
 FROM test";
-let expected =
-"Projection: concat_ws(Utf8(\"-\"), Utf8(\"true\"), 
CAST(test.col_int32 AS Utf8), Utf8(\"false-hello\"), test.col_utf8, 
Utf8(\"12--3.4\")) AS col\
-\n  TableScan: test projection=[col_int32, col_utf8]";
-quick_test(sql, expected);
-Ok(())
-}
-
-fn quick_test(sql: &str, expected_plan: &str) {
 let plan = test_sql(sql).unwrap();
-assert_eq!(expected_plan, format!("{plan}"));
+assert_snapshot!(
+format!("{plan}"),

Review Comment:
   Thanks for the comment, the `format!` is exactly unnecessary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub


Chen-Yuan-Lai commented on PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2957481921

   > Just FYI, if you expect more work on this issue, you may want to change 
"closes" to "part of" - because otherwise github will actually close the issue 
once this PR is merged
   
   Oh! Sorry, I have corrected the PR body
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub


blaginin commented on code in PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#discussion_r2136447543


##
datafusion/core/tests/optimizer/mod.rs:
##
@@ -107,16 +126,15 @@ fn concat_ws_literals() -> Result<()> {
 let sql = "SELECT concat_ws('-', true, col_int32, false, null, 'hello', 
col_utf8, 12, '', 3.4) \
 AS col
 FROM test";
-let expected =
-"Projection: concat_ws(Utf8(\"-\"), Utf8(\"true\"), 
CAST(test.col_int32 AS Utf8), Utf8(\"false-hello\"), test.col_utf8, 
Utf8(\"12--3.4\")) AS col\
-\n  TableScan: test projection=[col_int32, col_utf8]";
-quick_test(sql, expected);
-Ok(())
-}
-
-fn quick_test(sql: &str, expected_plan: &str) {
 let plan = test_sql(sql).unwrap();
-assert_eq!(expected_plan, format!("{plan}"));
+assert_snapshot!(
+format!("{plan}"),

Review Comment:
   assert_snapshot triggers format internally so you can just do 
   ```suggestion
   plan,
   ```



##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");

Review Comment:
   what do you think about just asserting the table itself without 
modifications? I fear that this code will:
   a - make test a bit more complicated
   b - needs to be manually maintained when we change the formatting 



##
datafusion/core/tests/sql/explain_analyze.rs:
##
@@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() {
 let formatted = arrow::util::pretty::pretty_format_batches(&results)
 .unwrap()
 .to_string();
+
 println!("Query Output:\n\n{formatted}");
 
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=Partial, gby=[]",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"FilterExec: c13@1 != C2GT5KVyOPZpgKVl110TyZO0NcJ434",
-"metrics=[output_rows=99, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"ProjectionExec: expr=[]",
-"metrics=[output_rows=5, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"CoalesceBatchesExec: target_batch_size=4096",
-"metrics=[output_rows=5, elapsed_compute"
-);
-assert_metrics!(
-&formatted,
-"UnionExec",
-"metrics=[output_rows=3, elapsed_compute="
-);
-assert_metrics!(
-&formatted,
-"WindowAggExec",
-"metrics=[output_rows=1, elapsed_compute="
-);
+let re = Regex::new(r"\|[^|]*\|([^|]*)\|").unwrap();
+let actual = formatted
+.lines()
+.map(|line| re.replace_all(line, "$1").trim_end().to_string())
+.filter(|line| !line.is_empty() && !line.starts_with('+'))
+.collect::>()
+.join("\n");
+insta::with_settings!({filters => vec![
+(r"\d+\.?\d*[µmn]?s", "[TIME]"),

Review Comment:
   `[µmn]` - I think this can be somewhat flaky it the test takes more time - 
maybe match on `[]`?



##
datafusion/core/tests/optimizer/mod.rs:
##
@@ -107,16 +126,15 @@ fn concat_ws_literals() -> Result<()> {
 let sql = "SELECT concat_ws('-', true, col_int32, false, null, 'hello', 
col_utf8, 12, '', 3.4) \
 AS col
 FROM test";
-let expected =
-"Project

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub


blaginin commented on PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2956983950

   Also could you please resolve the conflicts 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub


blaginin commented on PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2956908455

   Hey @Chen-Yuan-Lai!
   
   In the PR title you have 
   
   > part1
   
   but in the PR body you write
   
   > Closes https://github.com/apache/datafusion/issues/15791 
   
   Just FYI, if you expect more work on this issue, you may want to change 
"closes" to "part of" - because otherwise github will actually close the issue 
once this PR is merged 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-09 Thread via GitHub


alamb commented on PR #16324:
URL: https://github.com/apache/datafusion/pull/16324#issuecomment-2955861986

   Thank you @Chen-Yuan-Lai 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]