Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee merged PR #47307: URL: https://github.com/apache/doris/pull/47307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
github-actions[bot] commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2639560357 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2636439210 TeamCity be ut coverage result: Function Coverage: 42.09% (11016/26172) Line Coverage: 32.36% (92942/287251) Region Coverage: 31.52% (47667/151238) Branch Coverage: 27.53% (24109/87582) Coverage Report: http://coverage.selectdb-in.cc/coverage/c342b3f574b8d17b32c536ba5f2dac60186868be_c342b3f574b8d17b32c536ba5f2dac60186868be/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2636402557 ClickBench: Total hot run time: 30.91 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit c342b3f574b8d17b32c536ba5f2dac60186868be, data reload: false query1 0.030.030.03 query2 0.070.030.03 query3 0.240.060.07 query4 1.630.100.10 query5 0.420.420.41 query6 1.150.660.66 query7 0.020.020.01 query8 0.040.030.02 query9 0.580.500.49 query10 0.550.570.55 query11 0.150.110.11 query12 0.140.110.11 query13 0.610.600.60 query14 2.852.762.76 query15 0.900.820.82 query16 0.380.370.40 query17 1.091.021.02 query18 0.240.220.21 query19 1.951.752.00 query20 0.010.020.02 query21 15.37 0.940.59 query22 0.750.790.72 query23 15.27 1.410.55 query24 3.031.001.66 query25 0.260.160.15 query26 0.240.140.13 query27 0.050.040.05 query28 14.23 1.010.43 query29 12.60 4.023.30 query30 0.250.090.08 query31 2.810.620.39 query32 3.250.550.46 query33 3.083.013.07 query34 16.62 5.244.58 query35 4.574.554.56 query36 0.680.490.48 query37 0.090.060.06 query38 0.050.040.03 query39 0.040.030.03 query40 0.180.140.12 query41 0.080.030.03 query42 0.040.030.02 query43 0.030.030.03 Total cold run time: 106.62 s Total hot run time: 30.91 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2636363153 TPC-H: Total hot run time: 32403 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit c342b3f574b8d17b32c536ba5f2dac60186868be, data reload: false -- Round 1 -- q1 17597 548054495449 q2 2051299 179 179 q3 10501 1305717 717 q4 10218 983 522 522 q5 7596249021482148 q6 195 169 134 134 q7 905 748 604 604 q8 9230138712171217 q9 5192487749424877 q10 6877233119151915 q11 468 282 254 254 q12 353 365 227 227 q13 17783 374331393139 q14 227 231 220 220 q15 523 477 496 477 q16 627 621 576 576 q17 576 883 334 334 q18 7153637264756372 q19 1331972 534 534 q20 316 333 201 201 q21 2887219619901990 q22 374 332 317 317 Total cold run time: 102980 ms Total hot run time: 32403 ms - Round 2, with runtime_filter_mode=off - q1 5517554055215521 q2 238 331 237 237 q3 2306263223302330 q4 1392187713651365 q5 4363477347014701 q6 170 157 128 128 q7 2093202618521852 q8 2663283926762676 q9 7293731973447319 q10 3063330327762776 q11 599 546 507 507 q12 663 794 660 660 q13 3521388532403240 q14 274 298 269 269 q15 506 472 463 463 q16 640 689 653 653 q17 1229173012451245 q18 7613741674457416 q19 820 118510701070 q20 1952203618481848 q21 5663530650975097 q22 599 591 567 567 Total cold run time: 53177 ms Total hot run time: 51940 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2636390530 TPC-DS: Total hot run time: 184657 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit c342b3f574b8d17b32c536ba5f2dac60186868be, data reload: false query1 976 373 372 372 query2 7398208020342034 query3 6790214 209 209 query4 33096 23638 23248 23248 query5 4398629 468 468 query6 297 193 200 193 query7 4599497 306 306 query8 281 233 220 220 query9 9332265426602654 query10 473 323 247 247 query11 17755 15147 14955 14955 query12 164 113 108 108 query13 1666528 412 412 query14 9675696963906390 query15 229 198 197 197 query16 7892661 413 413 query17 1606725 574 574 query18 2028423 316 316 query19 226 191 166 166 query20 120 119 113 113 query21 211 129 106 106 query22 4107419141794179 query23 34204 32999 33101 32999 query24 6653223422332233 query25 487 446 382 382 query26 1221274 152 152 query27 2005471 330 330 query28 5286245224492449 query29 714 550 417 417 query30 225 187 162 162 query31 978 843 791 791 query32 82 68 61 61 query33 508 354 284 284 query34 742 873 524 524 query35 792 817 749 749 query36 972 1074941 941 query37 121 100 75 75 query38 4110421240724072 query39 1452138513771377 query40 203 112 102 102 query41 53 60 55 55 query42 120 98 103 98 query43 499 526 492 492 query44 1322799 797 797 query45 176 169 172 169 query46 847 1049640 640 query47 1820185517731773 query48 384 404 328 328 query49 781 484 437 437 query50 616 671 387 387 query51 4187421441314131 query52 108 102 100 100 query53 226 253 188 188 query54 484 497 406 406 query55 83 80 78 78 query56 265 261 246 246 query57 1169116710681068 query58 255 230 232 230 query59 2929307128322832 query60 265 273 252 252 query61 118 111 113 111 query62 811 700 656 656 query63 228 187 192 187 query64 42921021651 651 query65 3216316131753161 query66 1079420 299 299 query67 16028 15716 15575 15575 query68 3208842 544 544 query69 459 296 266 266 query70 1209115511691155 query71 380 297 261 261 query72 5758381640123816 query73 746 760 372 372 query74 9958905088118811 query75 3163316726542654 query76 30701174787 787 query77 455 367 286 286 query78 10015 10014 92839283 query79 2674812 681 681 query80 1679533 454 454 query81 565 272 240 240 query82 362 145 116 116 query83 270 176 157 157 query84 243 102 74 74 query85 781 344 299 299 query86 471 316 308 308 query87 4520449243474347 query88 4328219221492149 query89 391 334 302 302 query90 1833188 194 188 query91 143 141 109 109 query92 66 56 55 55 query93 2761877 537 537 query94 749 424 306 306 query95 343 266 259 259 query96 499 633 283 283 query97 2766290027712771 query98 239 210 206 206 query99 1285137312541254 Total cold run time: 281824 ms Total hot run time: 184657 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2636278328 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1942316823 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +static constexpr std::array HEX_ITOC = {'0', '1', '2', '3', '4', '5', '6', '7', + '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_offset[row] = col_offset[row - 1]; +continue; +} + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); +col_data.resize(col_data.size() + 10 + compressed_str.size()); + +// first ten digits represent the length of the uncompressed string Review Comment: mysql does it this way, is there a better way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1942316823 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +static constexpr std::array HEX_ITOC = {'0', '1', '2', '3', '4', '5', '6', '7', + '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_offset[row] = col_offset[row - 1]; +continue; +} + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); +col_data.resize(col_data.size() + 10 + compressed_str.size()); + +// first ten digits represent the length of the uncompressed string Review Comment: mysql does it this way, is there a better way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1942310992 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +static constexpr std::array HEX_ITOC = {'0', '1', '2', '3', '4', '5', '6', '7', + '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_offset[row] = col_offset[row - 1]; +continue; +} + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); +col_data.resize(col_data.size() + 10 + compressed_str.size()); + +// first ten digits represent the length of the uncompressed string Review Comment: why here directly use uint32_t save the length, need a HEX? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org -
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634686328 TPC-DS: Total hot run time: 192865 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 12facae2117299c27cfdc2dd328db4dabed78428, data reload: false query1 1304924 906 906 query2 6213207420082008 query3 10991 447746104477 query4 61732 28570 23087 23087 query5 5578605 447 447 query6 427 201 191 191 query7 5564500 294 294 query8 330 246 231 231 query9 8103268326632663 query10 458 318 256 256 query11 17532 15169 15419 15169 query12 165 111 113 111 query13 1438574 430 430 query14 10425 761477187614 query15 219 224 191 191 query16 7172613 480 480 query17 1150759 600 600 query18 1776426 353 353 query19 209 178 165 165 query20 117 114 115 114 query21 210 124 112 112 query22 4608482746374637 query23 34482 33471 33553 33471 query24 5486227223172272 query25 464 457 396 396 query26 668 282 158 158 query27 2141500 324 324 query28 4523252725112511 query29 540 571 454 454 query30 219 207 163 163 query31 969 914 867 867 query32 77 60 57 57 query33 441 367 311 311 query34 773 865 502 502 query35 825 850 762 762 query36 992 1074974 974 query37 117 98 77 77 query38 4356432742094209 query39 1484141814361418 query40 208 111 99 99 query41 52 48 52 48 query42 126 103 104 103 query43 513 543 500 500 query44 1302861 811 811 query45 193 170 169 169 query46 874 1055651 651 query47 1944193618591859 query48 410 440 333 333 query49 732 486 390 390 query50 646 666 385 385 query51 4288427443364274 query52 108 104 93 93 query53 235 254 190 190 query54 494 505 420 420 query55 81 74 79 74 query56 269 286 263 263 query57 1207121611371137 query58 233 234 230 230 query59 3328330730343034 query60 264 252 243 243 query61 121 114 122 114 query62 719 728 655 655 query63 219 183 182 182 query64 13161021662 662 query65 3240312431653124 query66 655 386 305 305 query67 16395 15696 15652 15652 query68 4373815 534 534 query69 526 300 252 252 query70 1218113611241124 query71 425 291 249 249 query72 6092391738343834 query73 694 790 359 359 query74 10128 901688878887 query75 3176317326852685 query76 32861159742 742 query77 494 338 320 320 query78 10193 10072 93509350 query79 2592800 605 605 query80 670 528 455 455 query81 497 272 243 243 query82 222 155 116 116 query83 177 174 154 154 query84 294 96 68 68 query85 749 342 304 304 query86 388 316 277 277 query87 4410449544644464 query88 4006214321372137 query89 385 320 282 282 query90 1670186 189 186 query91 130 134 108 108 query92 65 56 50 50 query93 2180845 533 533 query94 717 414 308 308 query95 329 266 252 252 query96 485 608 279 279 query97 2777286227602760 query98 216 195 196 195 query99 1289136512911291 Total cold run time: 309303 ms Total hot run time: 192865 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634704565 TeamCity be ut coverage result: Function Coverage: 42.06% (10994/26139) Line Coverage: 32.34% (92827/287074) Region Coverage: 31.48% (47583/151142) Branch Coverage: 27.51% (24085/87536) Coverage Report: http://coverage.selectdb-in.cc/coverage/12facae2117299c27cfdc2dd328db4dabed78428_12facae2117299c27cfdc2dd328db4dabed78428/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634697893 ClickBench: Total hot run time: 30.9 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 12facae2117299c27cfdc2dd328db4dabed78428, data reload: false query1 0.030.050.04 query2 0.070.040.04 query3 0.240.060.07 query4 1.630.110.10 query5 0.440.410.41 query6 1.130.660.66 query7 0.020.010.01 query8 0.040.030.03 query9 0.580.500.51 query10 0.550.550.55 query11 0.130.100.11 query12 0.140.100.10 query13 0.600.600.61 query14 2.722.762.75 query15 0.880.830.82 query16 0.390.370.39 query17 0.960.960.98 query18 0.230.210.20 query19 1.861.762.01 query20 0.020.020.01 query21 15.36 0.950.58 query22 0.750.840.61 query23 15.30 1.380.54 query24 2.612.071.38 query25 0.230.050.19 query26 0.260.150.14 query27 0.080.060.04 query28 14.45 0.990.42 query29 12.83 4.013.29 query30 0.240.080.06 query31 2.840.600.38 query32 3.240.540.45 query33 2.963.053.06 query34 16.52 5.124.53 query35 4.514.554.52 query36 0.670.490.48 query37 0.100.060.06 query38 0.040.030.03 query39 0.040.020.03 query40 0.160.130.13 query41 0.090.030.02 query42 0.040.030.02 query43 0.040.030.03 Total cold run time: 106.02 s Total hot run time: 30.9 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634657028 TPC-H: Total hot run time: 31960 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 12facae2117299c27cfdc2dd328db4dabed78428, data reload: false -- Round 1 -- q1 17562 543553645364 q2 2047304 163 163 q3 10574 1206725 725 q4 10206 955 535 535 q5 7518236321462146 q6 187 164 135 135 q7 897 762 602 602 q8 9262132211481148 q9 5172485849304858 q10 6840232718991899 q11 482 272 260 260 q12 350 357 228 228 q13 17947 371830873087 q14 228 231 220 220 q15 528 482 466 466 q16 642 608 578 578 q17 547 858 315 315 q18 6857627163886271 q19 1673944 522 522 q20 305 321 199 199 q21 2944211719351935 q22 364 335 304 304 Total cold run time: 103132 ms Total hot run time: 31960 ms - Round 2, with runtime_filter_mode=off - q1 5521539354475393 q2 239 320 232 232 q3 2218263123352335 q4 1411184214001400 q5 4312471946424642 q6 163 154 126 126 q7 1961195318331833 q8 2605275027232723 q9 7256720172247201 q10 3023327927852785 q11 557 518 495 495 q12 643 698 623 623 q13 3564397732873287 q14 285 285 283 283 q15 504 485 478 478 q16 630 700 642 642 q17 1208173512751275 q18 7549758873087308 q19 789 104311271043 q20 2008204819051905 q21 5808501049874987 q22 602 602 559 559 Total cold run time: 52856 ms Total hot run time: 51555 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634566958 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634357589 TeamCity be ut coverage result: Function Coverage: 42.06% (10995/26139) Line Coverage: 32.34% (92829/287074) Region Coverage: 31.49% (47598/151142) Branch Coverage: 27.52% (24088/87536) Coverage Report: http://coverage.selectdb-in.cc/coverage/52d222d5db3f76a84e5406c1f11294bb31156192_52d222d5db3f76a84e5406c1f11294bb31156192/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634185450 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2634108455 TeamCity be ut coverage result: Function Coverage: 42.06% (10993/26139) Line Coverage: 32.33% (92807/287076) Region Coverage: 31.48% (47578/151142) Branch Coverage: 27.51% (24080/87536) Coverage Report: http://coverage.selectdb-in.cc/coverage/42df82b0be1d0ed027e05062f36c438e3bf32308_42df82b0be1d0ed027e05062f36c438e3bf32308/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2633892159 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2632928217 TeamCity be ut coverage result: Function Coverage: 42.06% (10995/26139) Line Coverage: 32.33% (92807/287076) Region Coverage: 31.49% (47594/151142) Branch Coverage: 27.52% (24088/87536) Coverage Report: http://coverage.selectdb-in.cc/coverage/42df82b0be1d0ed027e05062f36c438e3bf32308_42df82b0be1d0ed027e05062f36c438e3bf32308/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2632830880 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2630194193 TeamCity be ut coverage result: Function Coverage: 42.08% (11000/26139) Line Coverage: 32.37% (92931/287083) Region Coverage: 31.52% (47645/151150) Branch Coverage: 27.55% (24120/87544) Coverage Report: http://coverage.selectdb-in.cc/coverage/376422f094b5ed32dcc058cd1f75940d1dd30081_376422f094b5ed32dcc058cd1f75940d1dd30081/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2630166689 ClickBench: Total hot run time: 31.38 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, data reload: false query1 0.040.040.03 query2 0.070.030.03 query3 0.240.070.07 query4 1.630.100.10 query5 0.420.410.40 query6 1.160.650.65 query7 0.020.020.02 query8 0.040.030.03 query9 0.580.490.50 query10 0.560.560.56 query11 0.150.100.10 query12 0.140.110.10 query13 0.610.590.59 query14 2.882.742.77 query15 0.880.850.83 query16 0.380.380.38 query17 1.011.041.05 query18 0.230.200.20 query19 1.831.752.03 query20 0.020.010.02 query21 15.38 0.920.57 query22 0.740.820.71 query23 15.15 1.420.62 query24 2.931.821.73 query25 0.130.100.09 query26 0.290.160.15 query27 0.070.060.04 query28 14.47 0.990.43 query29 12.58 3.923.24 query30 0.240.080.06 query31 2.840.580.39 query32 3.240.550.46 query33 2.992.973.01 query34 16.50 5.164.50 query35 4.554.534.57 query36 0.680.480.47 query37 0.090.060.05 query38 0.050.040.03 query39 0.040.020.03 query40 0.160.130.13 query41 0.080.020.02 query42 0.040.030.02 query43 0.030.030.03 Total cold run time: 106.16 s Total hot run time: 31.38 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2630158640 TPC-DS: Total hot run time: 191844 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, data reload: false query1 1304934 923 923 query2 6204207920672067 query3 10972 438043674367 query4 61069 29284 23235 23235 query5 5532589 437 437 query6 414 214 193 193 query7 5469515 297 297 query8 332 247 224 224 query9 7705266426562656 query10 459 301 253 253 query11 17717 15342 15375 15342 query12 161 107 105 105 query13 1387550 396 396 query14 11778 692868776877 query15 210 188 186 186 query16 6789640 462 462 query17 1114742 583 583 query18 1728421 300 300 query19 201 182 164 164 query20 126 115 115 115 query21 212 126 108 108 query22 4646478344414441 query23 34279 33373 33404 33373 query24 5515227723302277 query25 462 481 403 403 query26 640 274 153 153 query27 1556471 327 327 query28 3952252324842484 query29 575 578 447 447 query30 218 193 156 156 query31 899 879 824 824 query32 70 61 59 59 query33 441 399 309 309 query34 727 864 517 517 query35 865 847 755 755 query36 10171039980 980 query37 124 106 85 85 query38 4313433642174217 query39 1551146214371437 query40 205 117 108 108 query41 56 54 53 53 query42 121 109 113 109 query43 535 536 503 503 query44 1319833 857 833 query45 189 175 171 171 query46 895 1051678 678 query47 1898190818701870 query48 387 410 334 334 query49 728 481 405 405 query50 650 673 400 400 query51 4225425842904258 query52 138 101 90 90 query53 239 257 196 196 query54 507 491 424 424 query55 83 80 77 77 query56 278 269 256 256 query57 1187121711661166 query58 237 233 239 233 query59 3138319829962996 query60 277 257 268 257 query61 116 119 115 115 query62 746 705 656 656 query63 217 189 182 182 query64 12481014678 678 query65 3352314331663143 query66 746 388 295 295 query67 16208 15807 15507 15507 query68 5020822 525 525 query69 486 293 264 264 query70 1224111811431118 query71 412 277 252 252 query72 6422391238733873 query73 782 744 361 361 query74 9833930286688668 query75 3320312526952695 query76 38031176770 770 query77 480 359 361 359 query78 10156 10155 93349334 query79 2892795 603 603 query80 1700525 445 445 query81 547 275 237 237 query82 355 149 132 132 query83 267 166 146 146 query84 298 89 71 71 query85 765 346 347 346 query86 423 333 304 304 query87 4401475044154415 query88 3650216321352135 query89 394 321 287 287 query90 1649192 188 188 query91 135 139 114 114 query92 67 57 56 56 query93 2134853 530 530 query94 756 405 300 300 query95 317 262 248 248 query96 486 616 287 287 query97 2854286327972797 query98 227 196 200 196 query99 1290137712611261 Total cold run time: 310203 ms Total hot run time: 191844 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2630139443 TPC-H: Total hot run time: 32399 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, data reload: false -- Round 1 -- q1 17649 552554265426 q2 2065317 191 191 q3 10463 1219744 744 q4 10221 969 561 561 q5 7602242921642164 q6 192 170 144 144 q7 922 779 608 608 q8 9240137411751175 q9 5305492249154915 q10 6845235518941894 q11 483 271 259 259 q12 342 359 226 226 q13 1 365430633063 q14 229 236 214 214 q15 517 471 472 471 q16 631 627 586 586 q17 568 877 328 328 q18 7133652364216421 q19 1949960 546 546 q20 307 319 194 194 q21 2793215119561956 q22 368 333 313 313 Total cold run time: 103601 ms Total hot run time: 32399 ms - Round 2, with runtime_filter_mode=off - q1 5548546055015460 q2 243 338 230 230 q3 2279265823022302 q4 1437182614001400 q5 4312472546374637 q6 166 161 131 131 q7 2014195918661866 q8 2627283527072707 q9 7364727072507250 q10 3050325527862786 q11 596 499 496 496 q12 637 722 559 559 q13 3476394432573257 q14 297 294 298 294 q15 521 468 464 464 q16 668 695 646 646 q17 1259175512631263 q18 7607763373407340 q19 800 116010891089 q20 2016205318911891 q21 5848520248914891 q22 634 614 590 590 Total cold run time: 53399 ms Total hot run time: 51549 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2629991731 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1938767620 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,249 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +static constexpr std::array hex_itoc = {'0', '1', '2', '3', '4', '5', '6', '7', Review Comment: please keep constexpr UPPER CASE -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1938448935 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10 + 2 * compressed_str.size()); Review Comment: It may be not rery reasonable. There's reason for Mysql to behave like this. 1. after compressing, the bytes in corresponding memory just a stream of bytes. so any case is possible. just interpret it as chars doesn’t keep consistency. consider a memory region of “a\b”. after printing it’s “” because ‘\b’ deletes ‘a’. 2. for the compression ratio, it’s guaranteed by compression algorithm. it has a very large ratio. so even we print it as chars w
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2629278644 TeamCity be ut coverage result: Function Coverage: 42.08% (10998/26138) Line Coverage: 32.37% (92919/287059) Region Coverage: 31.52% (47635/151146) Branch Coverage: 27.55% (24114/87544) Coverage Report: http://coverage.selectdb-in.cc/coverage/b702390db852ea9772db6d961cd374efc0e1148d_b702390db852ea9772db6d961cd374efc0e1148d/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2629277567 ClickBench: Total hot run time: 30.65 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit b702390db852ea9772db6d961cd374efc0e1148d, data reload: false query1 0.030.030.05 query2 0.070.040.03 query3 0.240.070.06 query4 1.610.100.10 query5 0.420.420.41 query6 1.150.650.66 query7 0.020.020.01 query8 0.040.030.03 query9 0.590.510.51 query10 0.560.570.55 query11 0.140.100.10 query12 0.140.100.12 query13 0.600.590.59 query14 2.862.882.76 query15 0.900.840.85 query16 0.400.380.38 query17 1.011.001.06 query18 0.240.200.21 query19 1.881.781.98 query20 0.010.010.01 query21 15.36 0.950.57 query22 0.741.070.78 query23 14.93 1.370.52 query24 2.601.400.77 query25 0.280.100.14 query26 0.210.140.14 query27 0.080.060.04 query28 14.02 1.060.42 query29 12.64 3.993.26 query30 0.250.100.07 query31 2.840.590.38 query32 3.230.550.46 query33 3.083.083.06 query34 16.61 5.164.57 query35 4.584.584.58 query36 0.670.480.50 query37 0.100.060.06 query38 0.050.040.03 query39 0.030.020.02 query40 0.170.130.12 query41 0.080.030.03 query42 0.030.020.03 query43 0.030.040.03 Total cold run time: 105.52 s Total hot run time: 30.65 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2629276181 TPC-DS: Total hot run time: 190795 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit b702390db852ea9772db6d961cd374efc0e1148d, data reload: false query1 1297975 940 940 query2 6141202420292024 query3 11107 473146554655 query4 32350 23169 22881 22881 query5 3580611 437 437 query6 285 198 183 183 query7 3980486 306 306 query8 299 244 246 244 query9 9497261926032603 query10 459 305 255 255 query11 17585 15238 14891 14891 query12 155 109 101 101 query13 1575523 420 420 query14 8863644174576441 query15 242 189 189 189 query16 8175672 516 516 query17 1670775 595 595 query18 2125406 313 313 query19 211 191 166 166 query20 121 122 114 114 query21 206 124 109 109 query22 4590453845274527 query23 34342 33421 33389 33389 query24 6708224523432245 query25 510 458 401 401 query26 982 279 150 150 query27 2362475 320 320 query28 5385251124162416 query29 730 566 442 442 query30 213 188 159 159 query31 934 867 837 837 query32 92 61 58 58 query33 486 357 326 326 query34 755 883 518 518 query35 811 817 756 756 query36 990 1068955 955 query37 124 106 80 80 query38 4280433142254225 query39 1490143714311431 query40 203 110 102 102 query41 49 54 50 50 query42 124 103 102 102 query43 519 538 507 507 query44 1353812 805 805 query45 198 175 170 170 query46 854 1030639 639 query47 1902194418451845 query48 386 411 336 336 query49 743 486 396 396 query50 634 659 393 393 query51 4299426142674261 query52 101 103 95 95 query53 223 257 183 183 query54 505 486 404 404 query55 82 79 81 79 query56 263 279 253 253 query57 1243120411571157 query58 250 234 238 234 query59 3105319630593059 query60 285 266 250 250 query61 114 119 115 115 query62 789 742 705 705 query63 233 196 199 196 query64 42531028642 642 query65 3299327732573257 query66 973 395 302 302 query67 16048 15696 15286 15286 query68 4949823 519 519 query69 470 294 267 267 query70 1188115411281128 query71 388 282 278 278 query72 5837396438123812 query73 651 753 356 356 query74 10122 876390058763 query75 3158313326582658 query76 31061179760 760 query77 464 370 281 281 query78 9951999493719371 query79 3119797 600 600 query80 689 526 445 445 query81 501 277 245 245 query82 442 157 119 119 query83 167 174 147 147 query84 236 94 79 79 query85 784 348 307 307 query86 390 336 301 301 query87 4422461544914491 query88 4783217221572157 query89 400 326 293 293 query90 1843247 189 189 query91 132 138 107 107 query92 73 57 51 51 query93 2388869 540 540 query94 656 379 286 286 query95 343 264 259 259 query96 492 604 297 297 query97 2870287027522752 query98 233 202 204 202 query99 1280137712941294 Total cold run time: 285264 ms Total hot run time: 190795 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2629272993 TPC-H: Total hot run time: 32141 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit b702390db852ea9772db6d961cd374efc0e1148d, data reload: false -- Round 1 -- q1 17579 551853895389 q2 2047322 182 182 q3 10386 1225746 746 q4 10204 975 542 542 q5 7539234721342134 q6 190 168 137 137 q7 891 752 598 598 q8 9236135511651165 q9 5147484149054841 q10 6875238819231923 q11 481 282 254 254 q12 350 368 229 229 q13 17840 373930953095 q14 224 220 214 214 q15 521 474 478 474 q16 638 616 598 598 q17 555 854 314 314 q18 6846628564056285 q19 1729939 531 531 q20 317 312 190 190 q21 2781222619921992 q22 366 332 308 308 Total cold run time: 102742 ms Total hot run time: 32141 ms - Round 2, with runtime_filter_mode=off - q1 5643548755165487 q2 233 327 227 227 q3 2283263923082308 q4 1466183313811381 q5 4278472746274627 q6 166 156 126 126 q7 2025202318231823 q8 2608280326872687 q9 7280711572057115 q10 3053327828032803 q11 581 532 508 508 q12 654 715 566 566 q13 3458400433263326 q14 277 297 270 270 q15 520 478 473 473 q16 681 673 630 630 q17 1216175712581258 q18 7663749173487348 q19 804 117510431043 q20 2035204619521952 q21 5888528749194919 q22 650 664 573 573 Total cold run time: 53462 ms Total hot run time: 51450 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2629263274 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1938278948 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,249 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +std::array hex_itoc = {'0', '1', '2', '3', '4', '5', '6', '7', Review Comment: HEX_ITOC and constexpr and static -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1938278732 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10 + 2 * compressed_str.size()); Review Comment: I don't think you need to change the compressed bytes into a visible hexadecimal string. 1. the work maybe the result bigger than before compress 2. nobody care about the content of compressed bytes, people only care the compress really compress the data and decompress can get the same result before compress -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2623424626 TeamCity be ut coverage result: Function Coverage: 42.07% (10997/26138) Line Coverage: 32.36% (92890/287059) Region Coverage: 31.51% (47633/151146) Branch Coverage: 27.54% (24107/87544) Coverage Report: http://coverage.selectdb-in.cc/coverage/23e089b95f2a690fe4e2f913b1ec7550fceabdd3_23e089b95f2a690fe4e2f913b1ec7550fceabdd3/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2623419535 ClickBench: Total hot run time: 30.96 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, data reload: false query1 0.040.040.03 query2 0.070.030.04 query3 0.240.060.07 query4 1.610.110.10 query5 0.430.410.40 query6 1.140.650.65 query7 0.020.020.02 query8 0.040.030.03 query9 0.580.500.51 query10 0.570.550.56 query11 0.140.100.12 query12 0.140.110.11 query13 0.620.610.60 query14 2.842.862.90 query15 0.900.820.83 query16 0.370.390.38 query17 1.041.061.09 query18 0.220.220.22 query19 1.861.891.99 query20 0.020.010.01 query21 15.35 0.880.59 query22 0.760.860.65 query23 15.23 1.470.62 query24 3.010.991.16 query25 0.140.140.10 query26 0.370.170.14 query27 0.050.060.05 query28 13.36 1.020.42 query29 12.65 3.973.32 query30 0.250.100.06 query31 2.820.600.38 query32 3.220.560.46 query33 3.003.023.04 query34 16.52 5.124.50 query35 4.514.454.50 query36 0.640.520.49 query37 0.100.060.06 query38 0.050.040.04 query39 0.030.020.03 query40 0.170.120.12 query41 0.070.030.02 query42 0.030.020.02 query43 0.040.020.02 Total cold run time: 105.26 s Total hot run time: 30.96 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2623414208 TPC-DS: Total hot run time: 190787 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, data reload: false query1 1313946 942 942 query2 6103210720372037 query3 10970 439345004393 query4 60657 29230 23228 23228 query5 5571599 463 463 query6 440 203 175 175 query7 5543507 293 293 query8 330 240 229 229 query9 8401265926342634 query10 462 303 252 252 query11 17194 14930 15369 14930 query12 157 112 109 109 query13 1413560 440 440 query14 10384 711264686468 query15 216 214 199 199 query16 7329650 481 481 query17 1131745 618 618 query18 1899419 322 322 query19 228 196 164 164 query20 121 118 117 117 query21 215 125 106 106 query22 4377466645994599 query23 34466 33134 33559 33134 query24 5719230423652304 query25 465 462 389 389 query26 644 278 154 154 query27 1661457 334 334 query28 4026249024332433 query29 527 574 429 429 query30 210 194 154 154 query31 942 909 810 810 query32 75 54 57 54 query33 457 364 307 307 query34 747 873 511 511 query35 806 831 736 736 query36 10211010948 948 query37 120 109 73 73 query38 4384434943104310 query39 1487144314471443 query40 205 114 102 102 query41 52 50 56 50 query42 125 109 102 102 query43 526 543 507 507 query44 1374850 818 818 query45 185 179 166 166 query46 870 1071658 658 query47 1908187518461846 query48 389 425 326 326 query49 743 483 381 381 query50 649 654 392 392 query51 4334425942514251 query52 114 99 94 94 query53 225 252 187 187 query54 502 506 432 432 query55 87 83 79 79 query56 246 271 245 245 query57 1200120111411141 query58 240 227 237 227 query59 3147320331313131 query60 302 268 259 259 query61 113 125 115 115 query62 748 741 661 661 query63 221 183 184 183 query64 12801000645 645 query65 3228316331883163 query66 722 395 291 291 query67 15918 15522 15416 15416 query68 3911813 572 572 query69 480 309 258 258 query70 1162115411491149 query71 411 288 255 255 query72 5928403538213821 query73 658 767 370 370 query74 9969894987458745 query75 3243313326422642 query76 30651178767 767 query77 485 362 275 275 query78 997410069 93629362 query79 2676797 592 592 query80 1623533 441 441 query81 555 275 237 237 query82 352 147 116 116 query83 267 166 147 147 query84 291 94 79 79 query85 768 341 303 303 query86 414 312 307 307 query87 4506449644564456 query88 3689217022092170 query89 393 324 284 284 query90 1583185 189 185 query91 131 141 104 104 query92 59 58 55 55 query93 2363897 543 543 query94 695 398 307 307 query95 328 264 310 264 query96 488 611 280 280 query97 2864288127002700 query98 224 212 198 198 query99 1276140112131213 Total cold run time: 306695 ms Total hot run time: 190787 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2623402130 TPC-H: Total hot run time: 32303 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, data reload: false -- Round 1 -- q1 17578 549653735373 q2 2047308 164 164 q3 10425 1238769 769 q4 10214 991 547 547 q5 7944241621602160 q6 201 171 132 132 q7 904 762 596 596 q8 9228137111731173 q9 5320506548904890 q10 6842234219021902 q11 458 277 250 250 q12 341 359 217 217 q13 17750 369731183118 q14 240 239 219 219 q15 519 479 474 474 q16 641 621 584 584 q17 584 869 333 333 q18 6978633665006336 q19 1875974 556 556 q20 317 322 195 195 q21 2850229520052005 q22 366 340 310 310 Total cold run time: 103622 ms Total hot run time: 32303 ms - Round 2, with runtime_filter_mode=off - q1 5621552754895489 q2 251 326 238 238 q3 2247270923512351 q4 1376185513731373 q5 4350479248714792 q6 182 164 128 128 q7 2078197218471847 q8 2620287427572757 q9 7344722073017220 q10 3036327928072807 q11 574 505 488 488 q12 632 723 607 607 q13 3795391433823382 q14 288 294 274 274 q15 519 491 454 454 q16 657 704 648 648 q17 1257175012551255 q18 7770768173877387 q19 831 121410761076 q20 2000204118931893 q21 5802525951025102 q22 633 597 568 568 Total cold run time: 53863 ms Total hot run time: 52136 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2623345894 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1934155126 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10 + 2 * compressed_str.size()); Review Comment: Yes, mysql does the same thing. What I do is stream the compressed bytes into a visible hexadecimal string -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1934135609 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10 + 2 * compressed_str.size()); Review Comment: so what the function do ? seems maybe the result bigger than before compress ? Mysql do the same thing ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org ---
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1934135609 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10 + 2 * compressed_str.size()); Review Comment: so what the function do ? seems maybe the result bigger than before compress ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1934130870 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' Review Comment: better do the check before call `compress` ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. S
Re: [PR] [Enhancement] Support some compress functions [doris]
HappenLee commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1934123275 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,247 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; Review Comment: HEX_ITOC for const data. need constexpr, better be std::array -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1930829499 ## regression-test/suites/query_p0/sql_functions/string_functions/test_compress_uncompress.groovy: ## @@ -0,0 +1,83 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +suite("test_compress_uncompress") { +// Drop the existing table +sql "DROP TABLE IF EXISTS test_compression" + +// Create the test table +sql """ +CREATE TABLE test_compression ( +k0 INT, -- Primary key +text_col STRING, -- String column for input data +binary_col STRING-- Binary column for compressed data +) +DISTRIBUTED BY HASH(k0) +PROPERTIES ( +"replication_num" = "1" +); +""" + +// Insert test data with various cases (removing special characters) +sql """ +INSERT INTO test_compression VALUES +(1, 'Hello, world!', NULL),-- Plain string Review Comment: the `binary_col` should contains some valid value to uncompress -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
github-actions[bot] commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2616316739 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615833260 TeamCity be ut coverage result: Function Coverage: 42.08% (10980/26093) Line Coverage: 32.35% (92834/286927) Region Coverage: 31.50% (47593/151083) Branch Coverage: 27.54% (24108/87524) Coverage Report: http://coverage.selectdb-in.cc/coverage/72fafec758b44f421c4d53d99040500d328b9f5a_72fafec758b44f421c4d53d99040500d328b9f5a/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615822511 ClickBench: Total hot run time: 29.92 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 72fafec758b44f421c4d53d99040500d328b9f5a, data reload: false query1 0.030.030.06 query2 0.080.030.03 query3 0.240.070.06 query4 1.620.110.10 query5 0.400.420.42 query6 1.170.660.66 query7 0.020.020.01 query8 0.040.030.04 query9 0.580.520.51 query10 0.560.560.55 query11 0.150.100.10 query12 0.140.110.11 query13 0.610.590.60 query14 2.842.812.72 query15 0.900.820.82 query16 0.370.390.37 query17 1.061.031.05 query18 0.230.210.21 query19 1.921.802.02 query20 0.010.020.01 query21 15.36 0.900.57 query22 0.760.720.60 query23 15.43 1.410.49 query24 3.261.000.55 query25 0.120.300.19 query26 0.330.140.14 query27 0.050.060.07 query28 13.69 1.070.42 query29 12.64 3.913.22 query30 0.250.090.06 query31 2.820.590.37 query32 3.220.530.45 query33 3.012.943.02 query34 16.50 5.134.43 query35 4.584.514.49 query36 0.650.480.48 query37 0.090.060.06 query38 0.040.040.03 query39 0.030.020.03 query40 0.170.150.12 query41 0.080.030.02 query42 0.040.020.02 query43 0.030.030.03 Total cold run time: 106.12 s Total hot run time: 29.92 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615810154 TPC-DS: Total hot run time: 190970 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 72fafec758b44f421c4d53d99040500d328b9f5a, data reload: false query1 1313947 924 924 query2 6408197519941975 query3 10961 444843184318 query4 60965 28798 23186 23186 query5 5553610 461 461 query6 417 175 189 175 query7 5502500 291 291 query8 313 229 222 222 query9 8521267926792679 query10 440 300 254 254 query11 17834 15025 15430 15025 query12 169 108 109 108 query13 1447551 434 434 query14 10443 695264396439 query15 214 187 188 187 query16 7230627 477 477 query17 1137752 565 565 query18 1849425 309 309 query19 198 181 155 155 query20 113 113 108 108 query21 215 121 106 106 query22 4591461642774277 query23 34677 32988 33312 32988 query24 6146241023192319 query25 471 484 414 414 query26 648 249 172 172 query27 1765467 328 328 query28 4275251624702470 query29 531 542 441 441 query30 211 195 163 163 query31 931 890 820 820 query32 67 57 56 56 query33 441 342 308 308 query34 737 862 515 515 query35 787 877 784 784 query36 997 1051986 986 query37 119 98 83 83 query38 4313429443254294 query39 1484146814381438 query40 201 115 100 100 query41 55 53 50 50 query42 133 110 104 104 query43 521 533 490 490 query44 1297847 843 843 query45 180 176 167 167 query46 873 1074657 657 query47 1915188918471847 query48 394 387 323 323 query49 709 505 424 424 query50 642 678 408 408 query51 4259434742644264 query52 107 106 98 98 query53 237 255 191 191 query54 479 488 430 430 query55 83 79 81 79 query56 251 265 242 242 query57 1123118510871087 query58 256 244 236 236 query59 3095333930593059 query60 280 273 257 257 query61 120 113 131 113 query62 735 711 670 670 query63 220 189 186 186 query64 12541033644 644 query65 3296315731613157 query66 719 404 313 313 query67 15951 15728 15644 15644 query68 2710796 589 589 query69 432 359 263 263 query70 1230115111261126 query71 341 281 249 249 query72 5026386138273827 query73 619 747 457 457 query74 9566915091129112 query75 3156313426732673 query76 19111175787 787 query77 343 362 270 270 query78 10106 10219 93379337 query79 1242895 597 597 query80 1113521 434 434 query81 527 270 252 252 query82 1178153 125 125 query83 239 164 157 157 query84 283 97 72 72 query85 776 344 300 300 query86 340 329 262 262 query87 4450455043804380 query88 3551218921632163 query89 394 335 283 283 query90 1738187 190 187 query91 133 134 106 106 query92 60 57 52 52 query93 1101846 535 535 query94 617 377 303 303 query95 329 263 254 254 query96 485 626 284 284 query97 2834291427412741 query98 214 192 192 192 query99 1295136212641264 Total cold run time: 302157 ms Total hot run time: 190970 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615781090 TPC-H: Total hot run time: 32426 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 72fafec758b44f421c4d53d99040500d328b9f5a, data reload: false -- Round 1 -- q1 17768 551653515351 q2 2036301 169 169 q3 10431 1225721 721 q4 10478 985 525 525 q5 8112239421582158 q6 192 167 135 135 q7 909 779 609 609 q8 9229132811681168 q9 5330488349084883 q10 6822234219041904 q11 478 269 250 250 q12 345 353 223 223 q13 18022 372231053105 q14 230 246 215 215 q15 529 491 478 478 q16 637 610 593 593 q17 550 858 325 325 q18 6894652168356521 q19 43221342537 537 q20 309 319 194 194 q21 2778226320422042 q22 372 351 320 320 Total cold run time: 106773 ms Total hot run time: 32426 ms - Round 2, with runtime_filter_mode=off - q1 5731549955565499 q2 246 337 251 251 q3 2460284623952395 q4 1404181614671467 q5 4431488549974885 q6 170 166 130 130 q7 2152208418701870 q8 2680287626912691 q9 7203719372467193 q10 2976315027212721 q11 584 513 518 513 q12 699 767 614 614 q13 3498387632593259 q14 284 300 277 277 q15 519 471 482 471 q16 648 689 646 646 q17 1236170212651265 q18 7753734972817281 q19 795 114810771077 q20 2031204319181918 q21 5591506450215021 q22 600 618 565 565 Total cold run time: 53691 ms Total hot run time: 52009 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615692710 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615454377 ClickBench: Total hot run time: 30.47 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 99d8ddaef7e3fd486ef1ccd258f1225713513411, data reload: false query1 0.040.040.03 query2 0.070.040.03 query3 0.240.070.07 query4 1.610.110.10 query5 0.430.420.41 query6 1.160.660.67 query7 0.020.020.02 query8 0.040.030.03 query9 0.570.500.50 query10 0.550.560.54 query11 0.130.110.09 query12 0.140.100.11 query13 0.620.610.61 query14 2.682.722.74 query15 0.900.840.84 query16 0.360.380.39 query17 1.051.061.06 query18 0.240.210.21 query19 1.901.872.02 query20 0.010.010.01 query21 15.36 0.930.57 query22 0.750.830.63 query23 15.25 1.440.60 query24 3.040.642.03 query25 0.230.130.14 query26 0.240.150.15 query27 0.050.040.05 query28 14.04 1.050.44 query29 12.60 3.913.27 query30 0.270.090.07 query31 2.840.610.39 query32 3.220.550.46 query33 2.962.963.12 query34 16.72 5.164.50 query35 4.614.624.54 query36 0.670.480.48 query37 0.100.060.06 query38 0.050.040.04 query39 0.040.020.02 query40 0.160.140.13 query41 0.070.020.02 query42 0.030.020.02 query43 0.030.030.04 Total cold run time: 106.09 s Total hot run time: 30.47 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615442577 TPC-DS: Total hot run time: 192137 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 99d8ddaef7e3fd486ef1ccd258f1225713513411, data reload: false query1 1330946 914 914 query2 6165202520412025 query3 11109 453746144537 query4 61620 27953 23613 23613 query5 5596596 451 451 query6 428 196 183 183 query7 5506510 306 306 query8 335 251 246 246 query9 8056270326832683 query10 439 298 260 260 query11 17350 15120 15753 15120 query12 160 108 104 104 query13 1442588 453 453 query14 10910 726871887188 query15 203 202 194 194 query16 6594625 491 491 query17 1079732 550 550 query18 1149387 293 293 query19 196 183 161 161 query20 122 117 110 110 query21 214 128 106 106 query22 4634487543844384 query23 34182 33362 33871 33362 query24 5613228723832287 query25 470 456 381 381 query26 642 226 157 157 query27 1792487 335 335 query28 4293252524622462 query29 549 523 444 444 query30 213 188 158 158 query31 922 908 830 830 query32 73 59 53 53 query33 419 351 310 310 query34 747 854 532 532 query35 804 858 759 759 query36 10201089973 973 query37 122 98 79 79 query38 4330428042764276 query39 1497143814721438 query40 203 124 103 103 query41 50 56 47 47 query42 117 114 101 101 query43 524 524 496 496 query44 1279808 840 808 query45 184 168 167 167 query46 911 1063661 661 query47 1960190518641864 query48 413 417 339 339 query49 704 495 398 398 query50 652 666 393 393 query51 4357429943104299 query52 114 104 99 99 query53 233 261 200 200 query54 497 515 405 405 query55 81 80 86 80 query56 253 273 242 242 query57 1145121911471147 query58 242 227 233 227 query59 3045327729822982 query60 275 256 249 249 query61 118 110 110 110 query62 714 727 653 653 query63 218 180 190 180 query64 12991037639 639 query65 3237311631513116 query66 721 393 289 289 query67 16002 15718 15433 15433 query68 2886870 571 571 query69 455 312 265 265 query70 1225115011391139 query71 384 280 257 257 query72 6248394238313831 query73 649 740 356 356 query74 9996880689258806 query75 3152314826572657 query76 31601182776 776 query77 478 353 269 269 query78 10191 10028 93529352 query79 2912827 589 589 query80 1681519 432 432 query81 533 279 234 234 query82 372 148 117 117 query83 255 166 173 166 query84 291 95 74 74 query85 796 344 316 316 query86 406 322 262 262 query87 4425448744024402 query88 3757219721592159 query89 388 320 290 290 query90 1708192 193 192 query91 177 136 105 105 query92 62 60 50 50 query93 2463871 537 537 query94 747 403 299 299 query95 333 260 255 255 query96 482 630 278 278 query97 2805288627302730 query98 218 197 192 192 query99 1255137912441244 Total cold run time: 306763 ms Total hot run time: 192137 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615415100 TPC-H: Total hot run time: 32338 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 99d8ddaef7e3fd486ef1ccd258f1225713513411, data reload: false -- Round 1 -- q1 17593 551753825382 q2 2047316 166 166 q3 10413 1269765 765 q4 10201 961 532 532 q5 7529238121642164 q6 194 170 136 136 q7 898 763 622 622 q8 9239137912411241 q9 5255491349924913 q10 6867234218501850 q11 464 274 257 257 q12 353 367 226 226 q13 17771 370031223122 q14 233 236 209 209 q15 530 487 484 484 q16 624 618 577 577 q17 571 875 332 332 q18 6986636463016301 q19 1983955 550 550 q20 314 330 193 193 q21 3029222419931993 q22 366 351 323 323 Total cold run time: 103460 ms Total hot run time: 32338 ms - Round 2, with runtime_filter_mode=off - q1 5573551254755475 q2 232 332 238 238 q3 2201264023332333 q4 1438179513191319 q5 4299469746884688 q6 166 159 125 125 q7 2041194518041804 q8 2606285226932693 q9 7270713371857133 q10 3027328928132813 q11 581 519 486 486 q12 673 736 602 602 q13 3555392233743374 q14 296 292 281 281 q15 518 491 462 462 q16 674 687 641 641 q17 1255176012421242 q18 7669737774397377 q19 822 116810871087 q20 2003202418651865 q21 5813530848564856 q22 602 596 600 596 Total cold run time: 53314 ms Total hot run time: 51490 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2615331930 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1930088671 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,251 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; + +// When the original string is large, the result is roughly this value +size_t total = arg_offset[input_rows_count - 1]; +col_data.reserve(total / 1000); + +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin + arg_offset[row - 1], length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (size_t i = 0; i < 4; i++) { +unsigned char byte = (length >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four +col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F]; +} +idx += 10; + +col_data.resize(col_data.size() + 2 * compressed_str.size()); + +unsigned char* src = compressed_str.data(); +
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614448303 TeamCity be ut coverage result: Function Coverage: 42.08% (10979/26093) Line Coverage: 32.35% (92831/286929) Region Coverage: 31.50% (47590/151083) Branch Coverage: 27.54% (24105/87524) Coverage Report: http://coverage.selectdb-in.cc/coverage/9263dea5e49d60ca40fe41a6ec858405ae8202f9_9263dea5e49d60ca40fe41a6ec858405ae8202f9/report/index.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614440849 ClickBench: Total hot run time: 31.04 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, data reload: false query1 0.060.030.03 query2 0.070.030.03 query3 0.240.070.07 query4 1.610.100.10 query5 0.410.420.39 query6 1.160.660.67 query7 0.020.020.01 query8 0.040.030.03 query9 0.580.510.52 query10 0.540.570.55 query11 0.140.110.10 query12 0.140.110.11 query13 0.610.610.60 query14 2.842.752.81 query15 0.890.830.82 query16 0.360.370.37 query17 1.051.011.00 query18 0.220.210.22 query19 1.992.051.88 query20 0.010.010.01 query21 15.36 0.940.59 query22 0.740.900.62 query23 15.20 1.490.59 query24 3.291.331.69 query25 0.150.170.10 query26 0.290.150.14 query27 0.060.050.04 query28 14.06 1.060.43 query29 12.53 4.073.24 query30 0.250.080.06 query31 2.830.620.39 query32 3.230.550.46 query33 2.953.033.03 query34 16.53 5.264.52 query35 4.464.484.50 query36 0.650.510.47 query37 0.100.060.06 query38 0.050.040.04 query39 0.030.020.02 query40 0.160.130.13 query41 0.080.020.03 query42 0.030.030.02 query43 0.040.030.03 Total cold run time: 106.05 s Total hot run time: 31.04 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614438692 TPC-DS: Total hot run time: 184635 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, data reload: false query1 972 376 371 371 query2 6509204119881988 query3 6789219 218 218 query4 36458 23342 22916 22916 query5 4386602 450 450 query6 290 198 188 188 query7 4604484 306 306 query8 298 246 228 228 query9 9471269926922692 query10 465 306 254 254 query11 17964 15196 14913 14913 query12 160 109 102 102 query13 1642523 385 385 query14 9587689572366895 query15 234 191 183 183 query16 7785609 471 471 query17 1575702 540 540 query18 2009393 306 306 query19 233 191 159 159 query20 119 119 118 118 query21 207 123 104 104 query22 4130442740924092 query23 34663 32920 32906 32906 query24 6636229122672267 query25 481 480 422 422 query26 1040279 158 158 query27 1985472 345 345 query28 5057249224522452 query29 635 595 448 448 query30 235 190 156 156 query31 979 855 811 811 query32 70 62 62 62 query33 530 374 313 313 query34 750 845 497 497 query35 848 872 749 749 query36 964 1009955 955 query37 126 97 89 89 query38 4123412540754075 query39 1425138913791379 query40 203 121 104 104 query41 52 55 50 50 query42 119 102 107 102 query43 504 518 476 476 query44 1296797 803 797 query45 181 168 166 166 query46 862 1027651 651 query47 1818186617781778 query48 393 410 314 314 query49 778 494 390 390 query50 666 654 409 409 query51 4171415341294129 query52 105 104 90 90 query53 223 252 185 185 query54 486 494 405 405 query55 82 80 85 80 query56 255 270 235 235 query57 1157116310911091 query58 254 235 246 235 query59 3153321728802880 query60 276 280 260 260 query61 117 120 121 120 query62 832 726 644 644 query63 225 191 210 191 query64 35891020677 677 query65 3227315531613155 query66 934 416 315 315 query67 15861 15984 15412 15412 query68 4284842 528 528 query69 464 291 259 259 query70 1208113511491135 query71 372 280 258 258 query72 5864384139773841 query73 654 750 371 371 query74 10124 892789958927 query75 3185314926212621 query76 32171160775 775 query77 470 370 274 274 query78 10087 10007 94979497 query79 3152814 585 585 query80 1499530 438 438 query81 568 281 240 240 query82 715 152 129 129 query83 183 170 155 155 query84 238 100 73 73 query85 800 388 300 300 query86 427 300 297 297 query87 4543448042754275 query88 5047218321632163 query89 402 331 297 297 query90 1786191 186 186 query91 138 135 111 111 query92 66 58 54 54 query93 2742895 537 537 query94 740 409 297 297 query95 330 261 260 260 query96 488 612 279 279 query97 2758284927152715 query98 238 205 196 196 query99 1327137812581258 Total cold run time: 286369 ms Total hot run time: 184635 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614434365 TPC-H: Total hot run time: 32060 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, data reload: false -- Round 1 -- q1 17599 551353275327 q2 2048315 176 176 q3 10559 1307702 702 q4 10239 974 529 529 q5 8071234321192119 q6 197 165 132 132 q7 893 758 619 619 q8 9225134011851185 q9 5100482448514824 q10 6800233318851885 q11 458 275 267 267 q12 347 356 218 218 q13 17763 368131403140 q14 221 225 206 206 q15 506 462 465 462 q16 635 609 595 595 q17 549 850 325 325 q18 7160633663566336 q19 2894975 532 532 q20 304 322 190 190 q21 2803221619791979 q22 372 334 312 312 Total cold run time: 104743 ms Total hot run time: 32060 ms - Round 2, with runtime_filter_mode=off - q1 5588545254895452 q2 239 330 235 235 q3 2275268723322332 q4 1444182813481348 q5 4342469646604660 q6 180 164 129 129 q7 2107195718831883 q8 2656282426722672 q9 7277717471717171 q10 2963324627692769 q11 583 532 494 494 q12 690 779 682 682 q13 3493389533263326 q14 283 293 274 274 q15 518 473 459 459 q16 640 669 620 620 q17 1205171312661266 q18 7576731673267316 q19 782 113110331033 q20 2051203518711871 q21 5615529051185118 q22 628 610 600 600 Total cold run time: 53135 ms Total hot run time: 51710 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614420034 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614411403 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614377338 ClickBench: Total hot run time: 30.3 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false query1 0.030.030.04 query2 0.060.040.03 query3 0.240.060.07 query4 1.610.110.10 query5 0.430.440.41 query6 1.160.650.65 query7 0.020.010.01 query8 0.040.030.04 query9 0.590.490.51 query10 0.550.580.56 query11 0.140.110.11 query12 0.130.100.11 query13 0.630.600.60 query14 2.732.892.85 query15 0.890.840.82 query16 0.390.390.38 query17 1.011.031.00 query18 0.220.200.20 query19 1.861.772.09 query20 0.020.010.01 query21 15.37 0.940.56 query22 0.750.860.77 query23 15.15 1.480.60 query24 2.941.710.36 query25 0.280.090.13 query26 0.340.140.13 query27 0.050.070.06 query28 13.57 1.030.44 query29 12.58 3.963.27 query30 0.250.090.07 query31 2.820.600.37 query32 3.250.550.46 query33 3.013.013.05 query34 16.61 5.274.56 query35 4.514.544.52 query36 0.650.490.48 query37 0.100.060.06 query38 0.040.040.04 query39 0.030.020.03 query40 0.170.140.14 query41 0.090.030.03 query42 0.040.020.03 query43 0.040.040.03 Total cold run time: 105.39 s Total hot run time: 30.3 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614363270 TPC-DS: Total hot run time: 185720 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false query1 979 388 367 367 query2 6520206920022002 query3 6799218 219 218 query4 33222 23366 23107 23107 query5 4314612 458 458 query6 285 210 187 187 query7 4590486 314 314 query8 302 245 229 229 query9 9612270327042703 query10 466 306 249 249 query11 17939 15268 15179 15179 query12 157 103 102 102 query13 1670539 391 391 query14 10394 701269846984 query15 230 190 186 186 query16 7215619 477 477 query17 1595701 574 574 query18 1740393 312 312 query19 237 190 171 171 query20 122 117 111 111 query21 213 123 103 103 query22 4125442443094309 query23 34421 33045 33169 33045 query24 6612229523922295 query25 506 475 398 398 query26 1221277 158 158 query27 1968461 347 347 query28 5186246824512451 query29 607 571 453 453 query30 232 186 168 168 query31 964 891 814 814 query32 73 64 62 62 query33 524 419 307 307 query34 741 838 519 519 query35 794 804 762 762 query36 10221063941 941 query37 121 105 80 80 query38 4089416340044004 query39 1492138014541380 query40 205 111 103 103 query41 55 52 63 52 query42 120 103 109 103 query43 517 506 484 484 query44 1373814 816 814 query45 177 170 163 163 query46 859 1031647 647 query47 1779183517911791 query48 388 401 327 327 query49 758 478 397 397 query50 633 670 392 392 query51 4188421241414141 query52 101 106 93 93 query53 236 251 196 196 query54 488 496 404 404 query55 83 77 79 77 query56 263 267 242 242 query57 1151116810731073 query58 241 227 246 227 query59 3010299527552755 query60 277 265 251 251 query61 117 109 113 109 query62 792 720 664 664 query63 217 192 194 192 query64 40761017637 637 query65 3245320531433143 query66 906 414 311 311 query67 15870 15809 15640 15640 query68 5346836 541 541 query69 443 289 253 253 query70 1195116410831083 query71 387 282 260 260 query72 5798382237763776 query73 655 760 363 363 query74 9923894592498945 query75 3187312926562656 query76 31991183785 785 query77 481 367 283 283 query78 10003 10019 93459345 query79 3024829 604 604 query80 682 529 446 446 query81 498 277 282 277 query82 423 155 124 124 query83 165 174 153 153 query84 239 89 76 76 query85 787 337 301 301 query86 390 323 305 305 query87 4520442344394423 query88 5058217721572157 query89 386 327 294 294 query90 1809192 206 192 query91 131 133 105 105 query92 62 59 56 56 query93 2315876 541 541 query94 664 421 308 308 query95 334 278 262 262 query96 491 619 290 290 query97 2761287527252725 query98 229 205 195 195 query99 1287139412511251 Total cold run time: 282296 ms Total hot run time: 185720 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614350487 TPC-H: Total hot run time: 32328 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false -- Round 1 -- q1 17587 552054005400 q2 2046311 168 168 q3 10541 1284722 722 q4 10240 962 540 540 q5 8273248221822182 q6 195 165 135 135 q7 904 774 641 641 q8 9245136611781178 q9 5286487049294870 q10 6871235318791879 q11 456 280 259 259 q12 352 358 216 216 q13 17765 371331093109 q14 232 240 206 206 q15 536 483 459 459 q16 634 616 600 600 q17 567 876 320 320 q18 7111638663976386 q19 1677953 548 548 q20 312 323 190 190 q21 2862225320052005 q22 364 331 315 315 Total cold run time: 104056 ms Total hot run time: 32328 ms - Round 2, with runtime_filter_mode=off - q1 5710550755105507 q2 237 324 254 254 q3 2251260022672267 q4 1411180113611361 q5 4357473046504650 q6 173 163 129 129 q7 2075196518921892 q8 2583282926892689 q9 7428714372267143 q10 3027334528152815 q11 592 521 496 496 q12 672 778 609 609 q13 3498391334033403 q14 290 305 283 283 q15 524 479 464 464 q16 631 694 636 636 q17 1240172412621262 q18 7669755673627362 q19 761 106711281067 q20 1975207218981898 q21 5701529551315131 q22 592 572 577 572 Total cold run time: 53397 ms Total hot run time: 51890 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614340331 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614252615 ClickBench: Total hot run time: 30.67 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false query1 0.040.040.03 query2 0.070.030.03 query3 0.250.060.07 query4 1.610.110.10 query5 0.420.420.40 query6 1.170.650.66 query7 0.020.020.02 query8 0.040.040.03 query9 0.580.490.50 query10 0.560.560.54 query11 0.150.100.10 query12 0.140.110.11 query13 0.600.600.61 query14 2.852.742.72 query15 0.900.830.82 query16 0.390.380.36 query17 1.051.011.00 query18 0.240.200.20 query19 1.861.882.01 query20 0.010.010.01 query21 15.36 0.990.58 query22 0.770.820.75 query23 15.21 1.490.53 query24 3.250.920.84 query25 0.170.260.12 query26 0.180.150.14 query27 0.050.040.04 query28 13.60 1.090.44 query29 12.60 3.983.33 query30 0.260.080.06 query31 2.840.620.39 query32 3.230.540.46 query33 2.973.062.99 query34 16.60 5.174.52 query35 4.614.634.54 query36 0.650.480.50 query37 0.090.060.05 query38 0.050.040.04 query39 0.030.020.03 query40 0.160.130.14 query41 0.080.030.03 query42 0.040.030.02 query43 0.040.030.02 Total cold run time: 105.79 s Total hot run time: 30.67 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614251279 TPC-DS: Total hot run time: 191631 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false query1 1306964 932 932 query2 6184203820332033 query3 11103 470243994399 query4 61069 29129 23025 23025 query5 5535611 458 458 query6 432 204 183 183 query7 5529511 307 307 query8 331 247 233 233 query9 8032270827012701 query10 469 305 259 259 query11 17709 15224 15513 15224 query12 168 122 114 114 query13 1465546 409 409 query14 11082 704069946994 query15 210 206 197 197 query16 7241636 484 484 query17 1201730 591 591 query18 1910422 335 335 query19 205 194 165 165 query20 123 114 118 114 query21 225 131 106 106 query22 4449447045434470 query23 34433 33834 33260 33260 query24 5996236722972297 query25 460 467 404 404 query26 649 279 157 157 query27 1809459 333 333 query28 4055248924562456 query29 525 545 431 431 query30 214 192 158 158 query31 929 915 837 837 query32 64 60 57 57 query33 438 366 306 306 query34 742 872 503 503 query35 816 867 758 758 query36 10331051950 950 query37 115 107 78 78 query38 4310436242654265 query39 1508144814421442 query40 217 113 103 103 query41 51 51 50 50 query42 124 109 102 102 query43 507 516 494 494 query44 1338846 857 846 query45 183 173 171 171 query46 873 1054654 654 query47 1891197918741874 query48 396 407 342 342 query49 718 493 409 409 query50 649 707 400 400 query51 4265431341724172 query52 111 105 99 99 query53 228 254 200 200 query54 485 513 426 426 query55 81 82 82 82 query56 260 266 244 244 query57 1237121011541154 query58 233 231 236 231 query59 3223336430533053 query60 279 271 266 266 query61 139 112 117 112 query62 736 720 663 663 query63 225 184 185 184 query64 12861034656 656 query65 3273312431423124 query66 689 435 332 332 query67 16065 15658 15451 15451 query68 5022809 539 539 query69 475 295 264 264 query70 1178116111261126 query71 416 286 253 253 query72 6050389937973797 query73 803 764 353 353 query74 9860879286988698 query75 3220315627032703 query76 37961195748 748 query77 536 353 275 275 query78 10087 10047 93459345 query79 2453805 603 603 query80 1199524 485 485 query81 540 279 227 227 query82 355 160 126 126 query83 242 165 159 159 query84 291 92 70 70 query85 746 342 301 301 query86 377 321 301 301 query87 4546447644824476 query88 3486217421362136 query89 393 332 292 292 query90 1576182 188 182 query91 135 133 110 110 query92 61 56 55 55 query93 2163848 534 534 query94 737 402 294 294 query95 321 260 265 260 query96 489 618 279 279 query97 2815289428112811 query98 224 192 193 192 query99 1302140213181318 Total cold run time: 309730 ms Total hot run time: 191631 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614247633 TPC-H: Total hot run time: 32971 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false -- Round 1 -- q1 17843 617254225422 q2 2040300 178 178 q3 10412 1224728 728 q4 10882 965 536 536 q5 8400241021412141 q6 192 176 134 134 q7 906 820 595 595 q8 9228133911501150 q9 5785515850325032 q10 6988236119561956 q11 483 290 268 268 q12 344 370 227 227 q13 18216 399833873387 q14 272 251 243 243 q15 528 482 477 477 q16 649 626 589 589 q17 569 872 337 337 q18 8233654564616461 q19 2878984 543 543 q20 303 310 192 192 q21 2714221820452045 q22 362 332 330 330 Total cold run time: 108227 ms Total hot run time: 32971 ms - Round 2, with runtime_filter_mode=off - q1 5710551754695469 q2 231 318 233 233 q3 2257262223402340 q4 1412180713951395 q5 4328478148834781 q6 165 162 129 129 q7 2091192218271827 q8 2687280526592659 q9 7273726072627260 q10 3020321227592759 q11 586 520 498 498 q12 675 790 601 601 q13 3504397132933293 q14 284 306 281 281 q15 519 485 467 467 q16 661 677 630 630 q17 1209177712461246 q18 7790745074097409 q19 765 115410761076 q20 2000205019191919 q21 5653512450125012 q22 631 607 571 571 Total cold run time: 53451 ms Total hot run time: 51855 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614234362 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614230328 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614206532 ClickBench: Total hot run time: 30.68 s ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools ClickBench test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false query1 0.030.030.04 query2 0.080.040.03 query3 0.230.070.07 query4 1.610.110.10 query5 0.420.420.38 query6 1.150.660.66 query7 0.020.020.02 query8 0.040.040.03 query9 0.580.490.53 query10 0.550.560.56 query11 0.140.100.10 query12 0.130.110.11 query13 0.600.600.60 query14 2.852.812.88 query15 0.890.820.81 query16 0.390.380.39 query17 1.051.061.05 query18 0.220.210.20 query19 1.901.832.02 query20 0.020.010.01 query21 15.36 1.020.60 query22 0.750.750.65 query23 15.37 1.360.60 query24 2.911.920.87 query25 0.160.190.14 query26 0.230.140.13 query27 0.060.050.06 query28 14.17 1.020.43 query29 12.60 3.983.27 query30 0.260.090.06 query31 2.820.610.37 query32 3.240.550.46 query33 2.993.023.09 query34 16.54 5.174.51 query35 4.524.444.45 query36 0.650.490.52 query37 0.100.060.06 query38 0.050.040.03 query39 0.030.030.02 query40 0.170.130.13 query41 0.080.030.03 query42 0.040.020.02 query43 0.040.030.04 Total cold run time: 106.04 s Total hot run time: 30.68 s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614205536 TPC-DS: Total hot run time: 184954 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools TPC-DS sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false query1 962 376 368 368 query2 6516209720712071 query3 6802211 218 211 query4 33731 23198 22991 22991 query5 4407575 438 438 query6 270 185 173 173 query7 4596496 311 311 query8 282 244 214 214 query9 9565268827042688 query10 472 319 261 261 query11 18192 15054 15022 15022 query12 156 112 103 103 query13 1649517 409 409 query14 9175728668826882 query15 251 188 182 182 query16 8042641 482 482 query17 1621744 564 564 query18 2108401 306 306 query19 229 188 155 155 query20 115 109 110 109 query21 212 123 100 100 query22 4110443342674267 query23 33827 33022 32880 32880 query24 6450229122882288 query25 529 488 377 377 query26 1198265 156 156 query27 1997463 333 333 query28 5369245824482448 query29 719 545 418 418 query30 234 181 152 152 query31 934 849 774 774 query32 90 59 67 59 query33 496 354 330 330 query34 734 845 492 492 query35 800 817 741 741 query36 978 1063968 968 query37 120 104 75 75 query38 4136424640124012 query39 1461138113981381 query40 221 112 100 100 query41 53 49 55 49 query42 118 97 101 97 query43 511 505 477 477 query44 1332841 806 806 query45 175 173 163 163 query46 853 1032637 637 query47 1802181017291729 query48 380 404 305 305 query49 783 495 390 390 query50 621 641 390 390 query51 4237422840894089 query52 111 103 90 90 query53 223 255 184 184 query54 472 480 413 413 query55 81 82 79 79 query56 258 257 245 245 query57 1158114110661066 query58 243 237 245 237 query59 3158299929792979 query60 275 266 260 260 query61 119 119 115 115 query62 777 729 637 637 query63 240 199 182 182 query64 4436995 654 654 query65 3214319631593159 query66 1064407 313 313 query67 15922 15579 15428 15428 query68 4284817 546 546 query69 466 290 261 261 query70 1212110311181103 query71 374 281 250 250 query72 5796386237993799 query73 648 747 360 360 query74 10488 894189148914 query75 3156315526822682 query76 31391143755 755 query77 492 339 271 271 query78 992310078 94049404 query79 2446797 608 608 query80 788 528 465 465 query81 538 316 244 244 query82 348 151 125 125 query83 170 173 154 154 query84 237 88 77 77 query85 754 360 304 304 query86 440 321 306 306 query87 4424448144984481 query88 4173216522102165 query89 398 321 302 302 query90 1919190 188 188 query91 137 139 108 108 query92 70 60 55 55 query93 2644879 535 535 query94 746 408 294 294 query95 338 262 262 262 query96 484 603 277 277 query97 2775288627502750 query98 237 198 191 191 query99 1286137212541254 Total cold run time: 281702 ms Total hot run time: 184954 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to
Re: [PR] [Enhancement] Support some compress functions [doris]
doris-robot commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614202865 TPC-H: Total hot run time: 32100 ms ``` machine: 'aliyun_ecs.c7a.8xlarge_32C64G' scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools Tpch sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false -- Round 1 -- q1 17575 549653765376 q2 2052334 182 182 q3 10468 1303735 735 q4 10229 969 517 517 q5 7663238421672167 q6 191 165 136 136 q7 925 764 608 608 q8 9235139411491149 q9 5219492048334833 q10 6811231418901890 q11 476 280 258 258 q12 341 358 214 214 q13 17760 366130523052 q14 228 244 206 206 q15 512 471 459 459 q16 646 619 598 598 q17 558 860 317 317 q18 7189647564176417 q19 1807966 537 537 q20 304 319 185 185 q21 2804217319571957 q22 356 330 307 307 Total cold run time: 103349 ms Total hot run time: 32100 ms - Round 2, with runtime_filter_mode=off - q1 5511546054375437 q2 248 327 233 233 q3 2242263823072307 q4 1439183813651365 q5 4323472546814681 q6 167 156 124 124 q7 2080198618101810 q8 2656281126622662 q9 7293715671737156 q10 2932325827692769 q11 572 516 494 494 q12 717 749 595 595 q13 3494393432933293 q14 267 289 267 267 q15 505 474 464 464 q16 658 693 641 641 q17 1207173112561256 q18 7613737974007379 q19 768 115710311031 q20 2009202918661866 q21 5644521849864986 q22 597 652 555 555 Total cold run time: 52942 ms Total hot run time: 51371 ms ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614196175 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2614007660 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929542886 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,242 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin, length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (size_t i = 0; i < 4; i++) { +unsigned char byte = (length >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four +col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F]; +} +idx += 10; + +col_data.resize(col_data.size() + 2 * compressed_str.size()); + +unsigned char* src = compressed_str.data(); +for (size_t i = 0; i < compressed_str.size(); i++) { +col_data[idx] = hex_itoc[(*src >> 4) & 0x0F]; +col_data[idx + 1] = hex_itoc[(*src & 0x0F)]; +idx += 2
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929542675 ## be/src/util/block_compression.cpp: ## @@ -854,8 +854,13 @@ class ZlibBlockCompression : public BlockCompressionCodec { Slice s(*output); auto zres = ::compress((Bytef*)s.data, &s.size, (Bytef*)input.data, input.size); -if (zres != Z_OK) { -return Status::InvalidArgument("Fail to do ZLib compress, error={}", zError(zres)); +if (zres == Z_MEM_ERROR) { Review Comment: split them to another PR may be better -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929542732 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,242 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin, length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (size_t i = 0; i < 4; i++) { +unsigned char byte = (length >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four +col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F]; +} +idx += 10; + +col_data.resize(col_data.size() + 2 * compressed_str.size()); Review Comment: like comment for `uncompress`, add a reserve operation before the for loop may be better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929542483 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,242 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin, length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (size_t i = 0; i < 4; i++) { +unsigned char byte = (length >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four +col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F]; +} +idx += 10; + +col_data.resize(col_data.size() + 2 * compressed_str.size()); + +unsigned char* src = compressed_str.data(); +for (size_t i = 0; i < compressed_str.size(); i++) { +col_data[idx] = hex_itoc[(*src >> 4) & 0x0F]; +col_data[idx + 1] = hex_itoc[(*src & 0x0F)]; +idx += 2
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1928699139 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,269 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hexadecimal = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return make_nullable(std::make_shared()); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +auto null_column = ColumnUInt8::create(input_rows_count); +auto& null_map = null_column->get_data(); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +null_map[row] = false; +const auto& str = arg_column.get_data_at(row); +data = Slice(str.data, str.size); + +auto st = compression_codec->compress(data, &compressed_str); Review Comment: For example compress(abc) instead of compress('abc') -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929534888 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,242 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin, length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (size_t i = 0; i < 4; i++) { +unsigned char byte = (length >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four +col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F]; +} +idx += 10; + +col_data.resize(col_data.size() + 2 * compressed_str.size()); + +unsigned char* src = compressed_str.data(); +for (size_t i = 0; i < compressed_str.size(); i++) { +col_data[idx] = hex_itoc[(*src >> 4) & 0x0F]; +col_data[idx + 1] = hex_itoc[(*src & 0x0F)]; +idx += 2
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929534784 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,242 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return std::make_shared(); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin, length); + +// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, making sure st is always Z_OK +auto st = compression_codec->compress(data, &compressed_str); + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (size_t i = 0; i < 4; i++) { +unsigned char byte = (length >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four +col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F]; +} +idx += 10; + +col_data.resize(col_data.size() + 2 * compressed_str.size()); + +unsigned char* src = compressed_str.data(); +for (size_t i = 0; i < compressed_str.size(); i++) { +col_data[idx] = hex_itoc[(*src >> 4) & 0x0F]; +col_data[idx + 1] = hex_itoc[(*src & 0x0F)]; +idx += 2
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929533247 ## fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Uncompress.java: ## @@ -0,0 +1,67 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.nereids.trees.expressions.functions.scalar; + +import org.apache.doris.catalog.FunctionSignature; +import org.apache.doris.nereids.trees.expressions.Expression; +import org.apache.doris.nereids.trees.expressions.functions.AlwaysNullable; +import org.apache.doris.nereids.trees.expressions.functions.ExplicitlyCastableSignature; +import org.apache.doris.nereids.trees.expressions.shape.UnaryExpression; +import org.apache.doris.nereids.trees.expressions.visitor.ExpressionVisitor; +import org.apache.doris.nereids.types.StringType; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +import java.util.List; + +/** + * ScalarFunction 'uncompress'. + */ +public class Uncompress extends ScalarFunction +implements UnaryExpression, ExplicitlyCastableSignature, AlwaysNullable { + +public static final List SIGNATURES = ImmutableList.of( + FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE)); Review Comment: ditto. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929533192 ## fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Compress.java: ## @@ -0,0 +1,67 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +package org.apache.doris.nereids.trees.expressions.functions.scalar; + +import org.apache.doris.catalog.FunctionSignature; +import org.apache.doris.nereids.trees.expressions.Expression; +import org.apache.doris.nereids.trees.expressions.functions.ExplicitlyCastableSignature; +import org.apache.doris.nereids.trees.expressions.functions.PropagateNullable; +import org.apache.doris.nereids.trees.expressions.shape.UnaryExpression; +import org.apache.doris.nereids.trees.expressions.visitor.ExpressionVisitor; +import org.apache.doris.nereids.types.StringType; + +import com.google.common.base.Preconditions; +import com.google.common.collect.ImmutableList; + +import java.util.List; + +/** + * ScalarFunction 'compress'. + */ +public class Compress extends ScalarFunction +implements UnaryExpression, ExplicitlyCastableSignature, PropagateNullable { + +public static final List SIGNATURES = ImmutableList.of( + FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE)); Review Comment: should also accept `VarcharType` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1929533053 ## be/src/util/block_compression.cpp: ## @@ -854,8 +854,13 @@ class ZlibBlockCompression : public BlockCompressionCodec { Slice s(*output); auto zres = ::compress((Bytef*)s.data, &s.size, (Bytef*)input.data, input.size); -if (zres != Z_OK) { -return Status::InvalidArgument("Fail to do ZLib compress, error={}", zError(zres)); +if (zres == Z_MEM_ERROR) { Review Comment: also change other same calls -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1928875714 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,256 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hex_itoc = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return make_nullable(std::make_shared()); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& arg_data = arg_column.get_chars(); +auto& arg_offset = arg_column.get_offsets(); +const char* arg_begin = reinterpret_cast(arg_data.data()); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +auto null_column = ColumnUInt8::create(input_rows_count); +auto& null_map = null_column->get_data(); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +null_map[row] = false; +size_t length = arg_offset[row] - arg_offset[row - 1]; +data = Slice(arg_begin, length); + +auto st = compression_codec->compress(data, &compressed_str); + +if (!st.ok()) { // Failed to compress. The data should be a valid string or value. +col_offset[row] = col_offset[row - 1]; +null_map[row] = true; +continue; +} + +size_t idx = col_data.size(); +if (!length) { // data is '' +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (size_t i = 0; i < 4; i++) { +unsigned char byte = (length >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four +col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F]; +} +idx += 10; + +col_data.resize(col_data.size() + 2 *
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1928336564 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,269 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +string hexadecimal = "0123456789ABCDEF"; + +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return make_nullable(std::make_shared()); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +auto null_column = ColumnUInt8::create(input_rows_count); +auto& null_map = null_column->get_data(); + +faststring compressed_str; +Slice data; +for (size_t row = 0; row < input_rows_count; row++) { +null_map[row] = false; +const auto& str = arg_column.get_data_at(row); +data = Slice(str.data, str.size); + +auto st = compression_codec->compress(data, &compressed_str); Review Comment: when will compress fail? ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,269 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +#include + +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns
Re: [PR] [Enhancement] Support some compress functions [doris]
lzyy2024 commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1924960735 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,299 @@ +#include + +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return make_nullable(std::make_shared()); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// LOG(INFO) << "Executing FunctionCompress with " << input_rows_count +// << " rows."; // Log the number of rows being processed + +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +auto null_column = ColumnUInt8::create(input_rows_count); +auto& null_map = null_column->get_data(); + +faststring compressed_str; +Slice data; +for (int row = 0; row < input_rows_count; row++) { +null_map[row] = false; +const auto& str = arg_column.get_data_at(row); +data = Slice(str.data, str.size); + +// Print the original string (before compression) +// LOG(INFO) << "Original string at row " << row << ": " +// << std::string(str.data, str.size); + +auto st = compression_codec->compress(data, &compressed_str); + +if (!st.ok()) { +// LOG(INFO) << "Compression failed at row " << row +// << ", skipping this row."; // Log failure +col_offset[row] = col_offset[row - 1]; +null_map[row] = true; +continue; +} + +size_t idx = col_data.size(); +if (!str.size) { // null -> 0x +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +int value = (int)str.size; +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (int i = 0; i < 4; i++) { +unsigned char byte = (value >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = "0123456789ABCDEF"[byte >> 4]; // 高4位 +col_data[idx + 3 + i * 2] = "0123456789ABCDEF"[byte & 0x0F]; // 低4位 +} +idx += 10; + +col_data.resize(col_data.size() + 2 * compressed_str.size()); +// memcpy(col_data.data() + col_data.size(), compressed_str.data(), compressed_str.size()); + +unsigned char* src = compressed_str.data(); +{ +auto transform = [](char ch) -> unsigned char { +char x; +if (ch < 10) { +x = ch + '0'; +} else { +x = ch - 10 + 'A'; +} +// LOG(INFO) << "transform" << (int)x << "->" << x; +retu
Re: [PR] [Enhancement] Support some compress functions [doris]
zclllyybb commented on code in PR #47307: URL: https://github.com/apache/doris/pull/47307#discussion_r1924819842 ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,299 @@ +#include + +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include "vec/data_types/data_type.h" +#include "vec/data_types/data_type_nullable.h" +#include "vec/data_types/data_type_number.h" +#include "vec/data_types/data_type_string.h" +#include "vec/functions/function.h" +#include "vec/functions/simple_function_factory.h" + +namespace doris { +class FunctionContext; +} // namespace doris + +namespace doris::vectorized { + +class FunctionCompress : public IFunction { +public: +static constexpr auto name = "compress"; +static FunctionPtr create() { return std::make_shared(); } + +String get_name() const override { return name; } + +size_t get_number_of_arguments() const override { return 1; } + +DataTypePtr get_return_type_impl(const DataTypes& arguments) const override { +return make_nullable(std::make_shared()); +} + +Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments, +uint32_t result, size_t input_rows_count) const override { +// LOG(INFO) << "Executing FunctionCompress with " << input_rows_count +// << " rows."; // Log the number of rows being processed + +// Get the compression algorithm object +BlockCompressionCodec* compression_codec; + RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB, +&compression_codec)); + +const auto& arg_column = +assert_cast(*block.get_by_position(arguments[0]).column); +auto result_column = ColumnString::create(); + +auto& col_data = result_column->get_chars(); +auto& col_offset = result_column->get_offsets(); +col_offset.resize(input_rows_count); + +auto null_column = ColumnUInt8::create(input_rows_count); +auto& null_map = null_column->get_data(); + +faststring compressed_str; +Slice data; +for (int row = 0; row < input_rows_count; row++) { +null_map[row] = false; +const auto& str = arg_column.get_data_at(row); +data = Slice(str.data, str.size); + +// Print the original string (before compression) +// LOG(INFO) << "Original string at row " << row << ": " +// << std::string(str.data, str.size); + +auto st = compression_codec->compress(data, &compressed_str); + +if (!st.ok()) { +// LOG(INFO) << "Compression failed at row " << row +// << ", skipping this row."; // Log failure +col_offset[row] = col_offset[row - 1]; +null_map[row] = true; +continue; +} + +size_t idx = col_data.size(); +if (!str.size) { // null -> 0x +col_data.resize(col_data.size() + 2); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +col_offset[row] = col_offset[row - 1] + 2; +continue; +} + +// first ten digits represent the length of the uncompressed string +int value = (int)str.size; +col_data.resize(col_data.size() + 10); +col_data[idx] = '0', col_data[idx + 1] = 'x'; +for (int i = 0; i < 4; i++) { +unsigned char byte = (value >> (i * 8)) & 0xFF; +col_data[idx + 2 + i * 2] = "0123456789ABCDEF"[byte >> 4]; // 高4位 Review Comment: dont use Chinese ## be/src/vec/functions/function_compress.cpp: ## @@ -0,0 +1,299 @@ +#include + +#include +#include +#include +#include +#include +#include + +#include "common/status.h" +#include "util/block_compression.h" +#include "util/faststring.h" +#include "vec/aggregate_functions/aggregate_function.h" +#include "vec/columns/column.h" +#include "vec/columns/column_nullable.h" +#include "vec/columns/column_string.h" +#include "vec/columns/column_vector.h" +#include "vec/columns/columns_number.h" +#include "vec/common/assert_cast.h" +#include "vec/core/block.h" +#include "vec/core/column_numbers.h" +#include "vec/core/column_with_type_and_name.h" +#include "vec/core/types.h" +#include
Re: [PR] [Enhancement] Support some compress functions [doris]
hello-stephen commented on PR #47307: URL: https://github.com/apache/doris/pull/47307#issuecomment-2606478336 Thank you for your contribution to Apache Doris. Don't know what should be done next? See [How to process your PR](https://cwiki.apache.org/confluence/display/DORIS/How+to+process+your+PR). Please clearly describe your PR: 1. What problem was fixed (it's best to include specific error reporting information). How it was fixed. 2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be. 3. What features were added. Why was this function added? 4. Which code was refactored and why was this part of the code refactored? 5. Which functions were optimized and what is the difference before and after the optimization? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org