Re: [PR] feat: varchar default mapping to utf8view [datafusion]
alamb commented on code in PR #16142:
URL: https://github.com/apache/datafusion/pull/16142#discussion_r2109637172
##
parquet-testing:
##
Review Comment:
There appears to be an update of parquet-testing with unrelated content, but
I think it is ok to update it too
##
datafusion/core/tests/user_defined/user_defined_plan.rs:
##
@@ -796,22 +798,30 @@ fn accumulate_batch(
k: &usize,
) -> BTreeMap {
let num_rows = input_batch.num_rows();
+
// Assuming the input columns are
-// column[0]: customer_id / UTF8
+// column[0]: customer_id / UTF8 or UTF8View
Review Comment:
I feel like the example would be simpler if it just used `Utf8View`. Could
we do that and avoid adding the logic to handle both types?
##
datafusion-examples/examples/dataframe.rs:
##
@@ -182,7 +182,11 @@ async fn write_out(ctx: &SessionContext) ->
std::result::Result<(), DataFusionEr
let mut df = ctx.sql("values ('a'), ('b'), ('c')").await.unwrap();
// Ensure the column names and types match the target table
-df = df.with_column_renamed("column1", "tablecol1").unwrap();
+df = df
Review Comment:
I think it would be better if we could change the example to maybe just use
a `StringViewArray` in the first place (rather than have to cast it here)
##
datafusion/sqllogictest/test_files/avro.slt:
##
@@ -15,6 +15,10 @@
# specific language governing permissions and limitations
# under the License.
+# Currently, the avro not support Utf8View type, so we disable the
map_varchar_to_utf8view
Review Comment:
π
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] feat: varchar default mapping to utf8view [datafusion]
zhuqi-lucas commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2912064010 Thank you @alamb , it seems no regression from the benchmark result! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: varchar default mapping to utf8view [datafusion]
alamb commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2912049125 π€: Benchmark completed Details ``` Comparing HEAD and varchar_default_ut8view Benchmark clickbench_extended.json ββββ³β³ββ³ββββ β Queryβ HEAD β varchar_default_ut8view βChange β β‘ββββββββββ© β QQuery 0 β 1898.26ms β 1810.38ms β no change β β QQuery 1 β 658.97ms β670.82ms β no change β β QQuery 2 β 1406.85ms β 1396.48ms β no change β β QQuery 3 β 709.94ms β713.90ms β no change β β QQuery 4 β 1429.02ms β 1398.58ms β no change β β QQuery 5 β 15171.65ms β 15018.76ms β no change β β QQuery 6 β 2057.97ms β 1992.46ms β no change β β QQuery 7 β 2336.53ms β 2061.27ms β +1.13x faster β β QQuery 8 β 839.18ms β837.10ms β no change β ββββ΄β΄ββ΄ββββ ββ³β β Benchmark Summary ββ β‘ββ© β Total Time (HEAD) β 26508.38ms β β Total Time (varchar_default_ut8view) β 25899.76ms β β Average Time (HEAD)β 2945.38ms β β Average Time (varchar_default_ut8view) β 2877.75ms β β Queries Faster β 1 β β Queries Slower β 0 β β Queries with No Change β 8 β ββ΄β Benchmark clickbench_partitioned.json ββββ³β³ββ³ββββ β Queryβ HEAD β varchar_default_ut8view βChange β β‘ββββββββββ© β QQuery 0 β15.20ms β 15.35ms β no change β β QQuery 1 β33.35ms β 33.29ms β no change β β QQuery 2 β83.06ms β 79.39ms β no change β β QQuery 3 β96.03ms β 95.15ms β no change β β QQuery 4 β 603.39ms β590.48ms β no change β β QQuery 5 β 851.82ms β855.46ms β no change β β QQuery 6 β23.71ms β 22.55ms β no change β β QQuery 7 β36.83ms β 37.07ms β no change β β QQuery 8 β 933.66ms β916.79ms β no change β β QQuery 9 β 1192.79ms β 1193.28ms β no change β β QQuery 10β 260.86ms β262.38ms β no change β β QQuery 11β 299.90ms β293.48ms β no change β β QQuery 12β 906.00ms β898.14ms β no change β β QQuery 13β 1344.05ms β 1220.93ms β +1.10x faster β β QQuery 14β 842.98ms β823.52ms β no change β β QQuery 15β 842.44ms β835.95ms β no change β β QQuery 16β 1725.19ms β 1718.98ms β no change β β QQuery 17β 1598.00ms β 1614.71ms β no change β β QQuery 18β 3092.64ms β 3093.01ms β no change β β QQuery 19β81.09ms β 81.79ms β no change β β QQuery 20β .88ms β 1115.69ms β no change β β QQuery 21β 1298.42ms β 1296.34ms β no change β β QQuery 22β 2165.31ms β 2174.91ms β no change β β QQuery 23β 8034.33ms β 7921.79ms β no change β β QQuery 24β 458.46ms β457.26ms β no change β β QQuery 25β 391.76ms β372.64ms β no change β β QQuery 26β 528.56ms β519.61ms β no change β β QQuery 27β 1595.08ms β 1586.78ms β no change β β QQuery 28β 12566.00ms β 12405.74ms β no change β β QQuery 29β 535.24ms β532.68ms β no change β β QQuery 30β 796.06ms β807.39ms β no change β β QQuery 31β 863.55ms β857.96ms β no change β β QQuery 32β 2633.13ms β 2621.41ms β no change β β QQuery 33β 3300.20ms β 3347.80ms β no change β β QQuery 34β 3429.42ms β 3335.38ms β no change β β QQuery 35β 1338.57ms β 1340.32ms β no change β β QQuery 36β 123.48ms β118.34ms β
Re: [PR] feat: varchar default mapping to utf8view [datafusion]
alamb commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2911938227 > Resolved all problems now including the slt testing. > > I think the PR is ready for benchmark progress to see if switching to Utf8View by default slows some queries down. cc @alamb Thanks a lot. THanks @zhuqi-lucas -- I have started it up -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: varchar default mapping to utf8view [datafusion]
alamb commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2911937640 π€ `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing varchar_default_ut8view (5d9c537c446073c433bb4e3299bf9aa775fa0b60) to f3aed4a9bf988f5c0fdc3be845d57c0fe253cd45 [diff](https://github.com/apache/datafusion/compare/f3aed4a9bf988f5c0fdc3be845d57c0fe253cd45..5d9c537c446073c433bb4e3299bf9aa775fa0b60) Benchmarks: tpch_mem clickbench_partitioned clickbench_extended Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: varchar default mapping to utf8view [datafusion]
zhuqi-lucas commented on PR #16142: URL: https://github.com/apache/datafusion/pull/16142#issuecomment-2909555117 Resolved all problems now including the slt testing. I think the PR is ready for benchmark progress to see if switching to Utf8View by default slows some queries down. cc @alamb Thanks a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
