Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-06 Thread via GitHub


HappenLee merged PR #47307:
URL: https://github.com/apache/doris/pull/47307


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-06 Thread via GitHub


github-actions[bot] commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2639560357

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-05 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2636439210

   TeamCity be ut coverage result:
Function Coverage: 42.09% (11016/26172) 
Line Coverage: 32.36% (92942/287251)
Region Coverage: 31.52% (47667/151238)
Branch Coverage: 27.53% (24109/87582)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/c342b3f574b8d17b32c536ba5f2dac60186868be_c342b3f574b8d17b32c536ba5f2dac60186868be/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-05 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2636402557

   
   
   ClickBench: Total hot run time: 30.91 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit c342b3f574b8d17b32c536ba5f2dac60186868be, 
data reload: false
   
   query1   0.030.030.03
   query2   0.070.030.03
   query3   0.240.060.07
   query4   1.630.100.10
   query5   0.420.420.41
   query6   1.150.660.66
   query7   0.020.020.01
   query8   0.040.030.02
   query9   0.580.500.49
   query10  0.550.570.55
   query11  0.150.110.11
   query12  0.140.110.11
   query13  0.610.600.60
   query14  2.852.762.76
   query15  0.900.820.82
   query16  0.380.370.40
   query17  1.091.021.02
   query18  0.240.220.21
   query19  1.951.752.00
   query20  0.010.020.02
   query21  15.37   0.940.59
   query22  0.750.790.72
   query23  15.27   1.410.55
   query24  3.031.001.66
   query25  0.260.160.15
   query26  0.240.140.13
   query27  0.050.040.05
   query28  14.23   1.010.43
   query29  12.60   4.023.30
   query30  0.250.090.08
   query31  2.810.620.39
   query32  3.250.550.46
   query33  3.083.013.07
   query34  16.62   5.244.58
   query35  4.574.554.56
   query36  0.680.490.48
   query37  0.090.060.06
   query38  0.050.040.03
   query39  0.040.030.03
   query40  0.180.140.12
   query41  0.080.030.03
   query42  0.040.030.02
   query43  0.030.030.03
   Total cold run time: 106.62 s
   Total hot run time: 30.91 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-05 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2636363153

   
   
   TPC-H: Total hot run time: 32403 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit c342b3f574b8d17b32c536ba5f2dac60186868be, 
data reload: false
   
   -- Round 1 --
   q1   17597   548054495449
   q2   2051299 179 179
   q3   10501   1305717 717
   q4   10218   983 522 522
   q5   7596249021482148
   q6   195 169 134 134
   q7   905 748 604 604
   q8   9230138712171217
   q9   5192487749424877
   q10  6877233119151915
   q11  468 282 254 254
   q12  353 365 227 227
   q13  17783   374331393139
   q14  227 231 220 220
   q15  523 477 496 477
   q16  627 621 576 576
   q17  576 883 334 334
   q18  7153637264756372
   q19  1331972 534 534
   q20  316 333 201 201
   q21  2887219619901990
   q22  374 332 317 317
   Total cold run time: 102980 ms
   Total hot run time: 32403 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5517554055215521
   q2   238 331 237 237
   q3   2306263223302330
   q4   1392187713651365
   q5   4363477347014701
   q6   170 157 128 128
   q7   2093202618521852
   q8   2663283926762676
   q9   7293731973447319
   q10  3063330327762776
   q11  599 546 507 507
   q12  663 794 660 660
   q13  3521388532403240
   q14  274 298 269 269
   q15  506 472 463 463
   q16  640 689 653 653
   q17  1229173012451245
   q18  7613741674457416
   q19  820 118510701070
   q20  1952203618481848
   q21  5663530650975097
   q22  599 591 567 567
   Total cold run time: 53177 ms
   Total hot run time: 51940 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-05 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2636390530

   
   
   TPC-DS: Total hot run time: 184657 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit c342b3f574b8d17b32c536ba5f2dac60186868be, 
data reload: false
   
   query1   976 373 372 372
   query2   7398208020342034
   query3   6790214 209 209
   query4   33096   23638   23248   23248
   query5   4398629 468 468
   query6   297 193 200 193
   query7   4599497 306 306
   query8   281 233 220 220
   query9   9332265426602654
   query10  473 323 247 247
   query11  17755   15147   14955   14955
   query12  164 113 108 108
   query13  1666528 412 412
   query14  9675696963906390
   query15  229 198 197 197
   query16  7892661 413 413
   query17  1606725 574 574
   query18  2028423 316 316
   query19  226 191 166 166
   query20  120 119 113 113
   query21  211 129 106 106
   query22  4107419141794179
   query23  34204   32999   33101   32999
   query24  6653223422332233
   query25  487 446 382 382
   query26  1221274 152 152
   query27  2005471 330 330
   query28  5286245224492449
   query29  714 550 417 417
   query30  225 187 162 162
   query31  978 843 791 791
   query32  82  68  61  61
   query33  508 354 284 284
   query34  742 873 524 524
   query35  792 817 749 749
   query36  972 1074941 941
   query37  121 100 75  75
   query38  4110421240724072
   query39  1452138513771377
   query40  203 112 102 102
   query41  53  60  55  55
   query42  120 98  103 98
   query43  499 526 492 492
   query44  1322799 797 797
   query45  176 169 172 169
   query46  847 1049640 640
   query47  1820185517731773
   query48  384 404 328 328
   query49  781 484 437 437
   query50  616 671 387 387
   query51  4187421441314131
   query52  108 102 100 100
   query53  226 253 188 188
   query54  484 497 406 406
   query55  83  80  78  78
   query56  265 261 246 246
   query57  1169116710681068
   query58  255 230 232 230
   query59  2929307128322832
   query60  265 273 252 252
   query61  118 111 113 111
   query62  811 700 656 656
   query63  228 187 192 187
   query64  42921021651 651
   query65  3216316131753161
   query66  1079420 299 299
   query67  16028   15716   15575   15575
   query68  3208842 544 544
   query69  459 296 266 266
   query70  1209115511691155
   query71  380 297 261 261
   query72  5758381640123816
   query73  746 760 372 372
   query74  9958905088118811
   query75  3163316726542654
   query76  30701174787 787
   query77  455 367 286 286
   query78  10015   10014   92839283
   query79  2674812 681 681
   query80  1679533 454 454
   query81  565 272 240 240
   query82  362 145 116 116
   query83  270 176 157 157
   query84  243 102 74  74
   query85  781 344 299 299
   query86  471 316 308 308
   query87  4520449243474347
   query88  4328219221492149
   query89  391 334 302 302
   query90  1833188 194 188
   query91  143 141 109 109
   query92  66  56  55  55
   query93  2761877 537 537
   query94  749 424 306 306
   query95  343 266 259 259
   query96  499 633 283 283
   query97  2766290027712771
   query98  239 210 206 206
   query99  1285137312541254
   Total cold run time: 281824 ms
   Total hot run time: 184657 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-05 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2636278328

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


lzyy2024 commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1942316823


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,240 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+static constexpr std::array HEX_ITOC = {'0', '1', '2', '3', '4', 
'5', '6', '7',
+  '8', '9', 'A', 'B', 'C', 
'D', 'E', 'F'};
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_offset[row] = col_offset[row - 1];
+continue;
+}
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+col_data.resize(col_data.size() + 10 + compressed_str.size());
+
+// first ten digits represent the length of the uncompressed string

Review Comment:
   mysql does it this way, is there a better way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


lzyy2024 commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1942316823


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,240 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+static constexpr std::array HEX_ITOC = {'0', '1', '2', '3', '4', 
'5', '6', '7',
+  '8', '9', 'A', 'B', 'C', 
'D', 'E', 'F'};
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_offset[row] = col_offset[row - 1];
+continue;
+}
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+col_data.resize(col_data.size() + 10 + compressed_str.size());
+
+// first ten digits represent the length of the uncompressed string

Review Comment:
   mysql does it this way, is there a better way?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1942310992


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,240 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+static constexpr std::array HEX_ITOC = {'0', '1', '2', '3', '4', 
'5', '6', '7',
+  '8', '9', 'A', 'B', 'C', 
'D', 'E', 'F'};
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_offset[row] = col_offset[row - 1];
+continue;
+}
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+col_data.resize(col_data.size() + 10 + compressed_str.size());
+
+// first ten digits represent the length of the uncompressed string

Review Comment:
   why here directly use uint32_t save the length, need a HEX? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634686328

   
   
   TPC-DS: Total hot run time: 192865 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 12facae2117299c27cfdc2dd328db4dabed78428, 
data reload: false
   
   query1   1304924 906 906
   query2   6213207420082008
   query3   10991   447746104477
   query4   61732   28570   23087   23087
   query5   5578605 447 447
   query6   427 201 191 191
   query7   5564500 294 294
   query8   330 246 231 231
   query9   8103268326632663
   query10  458 318 256 256
   query11  17532   15169   15419   15169
   query12  165 111 113 111
   query13  1438574 430 430
   query14  10425   761477187614
   query15  219 224 191 191
   query16  7172613 480 480
   query17  1150759 600 600
   query18  1776426 353 353
   query19  209 178 165 165
   query20  117 114 115 114
   query21  210 124 112 112
   query22  4608482746374637
   query23  34482   33471   33553   33471
   query24  5486227223172272
   query25  464 457 396 396
   query26  668 282 158 158
   query27  2141500 324 324
   query28  4523252725112511
   query29  540 571 454 454
   query30  219 207 163 163
   query31  969 914 867 867
   query32  77  60  57  57
   query33  441 367 311 311
   query34  773 865 502 502
   query35  825 850 762 762
   query36  992 1074974 974
   query37  117 98  77  77
   query38  4356432742094209
   query39  1484141814361418
   query40  208 111 99  99
   query41  52  48  52  48
   query42  126 103 104 103
   query43  513 543 500 500
   query44  1302861 811 811
   query45  193 170 169 169
   query46  874 1055651 651
   query47  1944193618591859
   query48  410 440 333 333
   query49  732 486 390 390
   query50  646 666 385 385
   query51  4288427443364274
   query52  108 104 93  93
   query53  235 254 190 190
   query54  494 505 420 420
   query55  81  74  79  74
   query56  269 286 263 263
   query57  1207121611371137
   query58  233 234 230 230
   query59  3328330730343034
   query60  264 252 243 243
   query61  121 114 122 114
   query62  719 728 655 655
   query63  219 183 182 182
   query64  13161021662 662
   query65  3240312431653124
   query66  655 386 305 305
   query67  16395   15696   15652   15652
   query68  4373815 534 534
   query69  526 300 252 252
   query70  1218113611241124
   query71  425 291 249 249
   query72  6092391738343834
   query73  694 790 359 359
   query74  10128   901688878887
   query75  3176317326852685
   query76  32861159742 742
   query77  494 338 320 320
   query78  10193   10072   93509350
   query79  2592800 605 605
   query80  670 528 455 455
   query81  497 272 243 243
   query82  222 155 116 116
   query83  177 174 154 154
   query84  294 96  68  68
   query85  749 342 304 304
   query86  388 316 277 277
   query87  4410449544644464
   query88  4006214321372137
   query89  385 320 282 282
   query90  1670186 189 186
   query91  130 134 108 108
   query92  65  56  50  50
   query93  2180845 533 533
   query94  717 414 308 308
   query95  329 266 252 252
   query96  485 608 279 279
   query97  2777286227602760
   query98  216 195 196 195
   query99  1289136512911291
   Total cold run time: 309303 ms
   Total hot run time: 192865 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634704565

   TeamCity be ut coverage result:
Function Coverage: 42.06% (10994/26139) 
Line Coverage: 32.34% (92827/287074)
Region Coverage: 31.48% (47583/151142)
Branch Coverage: 27.51% (24085/87536)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/12facae2117299c27cfdc2dd328db4dabed78428_12facae2117299c27cfdc2dd328db4dabed78428/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634697893

   
   
   ClickBench: Total hot run time: 30.9 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 12facae2117299c27cfdc2dd328db4dabed78428, 
data reload: false
   
   query1   0.030.050.04
   query2   0.070.040.04
   query3   0.240.060.07
   query4   1.630.110.10
   query5   0.440.410.41
   query6   1.130.660.66
   query7   0.020.010.01
   query8   0.040.030.03
   query9   0.580.500.51
   query10  0.550.550.55
   query11  0.130.100.11
   query12  0.140.100.10
   query13  0.600.600.61
   query14  2.722.762.75
   query15  0.880.830.82
   query16  0.390.370.39
   query17  0.960.960.98
   query18  0.230.210.20
   query19  1.861.762.01
   query20  0.020.020.01
   query21  15.36   0.950.58
   query22  0.750.840.61
   query23  15.30   1.380.54
   query24  2.612.071.38
   query25  0.230.050.19
   query26  0.260.150.14
   query27  0.080.060.04
   query28  14.45   0.990.42
   query29  12.83   4.013.29
   query30  0.240.080.06
   query31  2.840.600.38
   query32  3.240.540.45
   query33  2.963.053.06
   query34  16.52   5.124.53
   query35  4.514.554.52
   query36  0.670.490.48
   query37  0.100.060.06
   query38  0.040.030.03
   query39  0.040.020.03
   query40  0.160.130.13
   query41  0.090.030.02
   query42  0.040.030.02
   query43  0.040.030.03
   Total cold run time: 106.02 s
   Total hot run time: 30.9 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634657028

   
   
   TPC-H: Total hot run time: 31960 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 12facae2117299c27cfdc2dd328db4dabed78428, 
data reload: false
   
   -- Round 1 --
   q1   17562   543553645364
   q2   2047304 163 163
   q3   10574   1206725 725
   q4   10206   955 535 535
   q5   7518236321462146
   q6   187 164 135 135
   q7   897 762 602 602
   q8   9262132211481148
   q9   5172485849304858
   q10  6840232718991899
   q11  482 272 260 260
   q12  350 357 228 228
   q13  17947   371830873087
   q14  228 231 220 220
   q15  528 482 466 466
   q16  642 608 578 578
   q17  547 858 315 315
   q18  6857627163886271
   q19  1673944 522 522
   q20  305 321 199 199
   q21  2944211719351935
   q22  364 335 304 304
   Total cold run time: 103132 ms
   Total hot run time: 31960 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5521539354475393
   q2   239 320 232 232
   q3   2218263123352335
   q4   1411184214001400
   q5   4312471946424642
   q6   163 154 126 126
   q7   1961195318331833
   q8   2605275027232723
   q9   7256720172247201
   q10  3023327927852785
   q11  557 518 495 495
   q12  643 698 623 623
   q13  3564397732873287
   q14  285 285 283 283
   q15  504 485 478 478
   q16  630 700 642 642
   q17  1208173512751275
   q18  7549758873087308
   q19  789 104311271043
   q20  2008204819051905
   q21  5808501049874987
   q22  602 602 559 559
   Total cold run time: 52856 ms
   Total hot run time: 51555 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634566958

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634357589

   TeamCity be ut coverage result:
Function Coverage: 42.06% (10995/26139) 
Line Coverage: 32.34% (92829/287074)
Region Coverage: 31.49% (47598/151142)
Branch Coverage: 27.52% (24088/87536)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/52d222d5db3f76a84e5406c1f11294bb31156192_52d222d5db3f76a84e5406c1f11294bb31156192/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634185450

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2634108455

   TeamCity be ut coverage result:
Function Coverage: 42.06% (10993/26139) 
Line Coverage: 32.33% (92807/287076)
Region Coverage: 31.48% (47578/151142)
Branch Coverage: 27.51% (24080/87536)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/42df82b0be1d0ed027e05062f36c438e3bf32308_42df82b0be1d0ed027e05062f36c438e3bf32308/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-04 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2633892159

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-03 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2632928217

   TeamCity be ut coverage result:
Function Coverage: 42.06% (10995/26139) 
Line Coverage: 32.33% (92807/287076)
Region Coverage: 31.49% (47594/151142)
Branch Coverage: 27.52% (24088/87536)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/42df82b0be1d0ed027e05062f36c438e3bf32308_42df82b0be1d0ed027e05062f36c438e3bf32308/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-03 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2632830880

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-02 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2630194193

   TeamCity be ut coverage result:
Function Coverage: 42.08% (11000/26139) 
Line Coverage: 32.37% (92931/287083)
Region Coverage: 31.52% (47645/151150)
Branch Coverage: 27.55% (24120/87544)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/376422f094b5ed32dcc058cd1f75940d1dd30081_376422f094b5ed32dcc058cd1f75940d1dd30081/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-02 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2630166689

   
   
   ClickBench: Total hot run time: 31.38 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, 
data reload: false
   
   query1   0.040.040.03
   query2   0.070.030.03
   query3   0.240.070.07
   query4   1.630.100.10
   query5   0.420.410.40
   query6   1.160.650.65
   query7   0.020.020.02
   query8   0.040.030.03
   query9   0.580.490.50
   query10  0.560.560.56
   query11  0.150.100.10
   query12  0.140.110.10
   query13  0.610.590.59
   query14  2.882.742.77
   query15  0.880.850.83
   query16  0.380.380.38
   query17  1.011.041.05
   query18  0.230.200.20
   query19  1.831.752.03
   query20  0.020.010.02
   query21  15.38   0.920.57
   query22  0.740.820.71
   query23  15.15   1.420.62
   query24  2.931.821.73
   query25  0.130.100.09
   query26  0.290.160.15
   query27  0.070.060.04
   query28  14.47   0.990.43
   query29  12.58   3.923.24
   query30  0.240.080.06
   query31  2.840.580.39
   query32  3.240.550.46
   query33  2.992.973.01
   query34  16.50   5.164.50
   query35  4.554.534.57
   query36  0.680.480.47
   query37  0.090.060.05
   query38  0.050.040.03
   query39  0.040.020.03
   query40  0.160.130.13
   query41  0.080.020.02
   query42  0.040.030.02
   query43  0.030.030.03
   Total cold run time: 106.16 s
   Total hot run time: 31.38 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-02 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2630158640

   
   
   TPC-DS: Total hot run time: 191844 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, 
data reload: false
   
   query1   1304934 923 923
   query2   6204207920672067
   query3   10972   438043674367
   query4   61069   29284   23235   23235
   query5   5532589 437 437
   query6   414 214 193 193
   query7   5469515 297 297
   query8   332 247 224 224
   query9   7705266426562656
   query10  459 301 253 253
   query11  17717   15342   15375   15342
   query12  161 107 105 105
   query13  1387550 396 396
   query14  11778   692868776877
   query15  210 188 186 186
   query16  6789640 462 462
   query17  1114742 583 583
   query18  1728421 300 300
   query19  201 182 164 164
   query20  126 115 115 115
   query21  212 126 108 108
   query22  4646478344414441
   query23  34279   33373   33404   33373
   query24  5515227723302277
   query25  462 481 403 403
   query26  640 274 153 153
   query27  1556471 327 327
   query28  3952252324842484
   query29  575 578 447 447
   query30  218 193 156 156
   query31  899 879 824 824
   query32  70  61  59  59
   query33  441 399 309 309
   query34  727 864 517 517
   query35  865 847 755 755
   query36  10171039980 980
   query37  124 106 85  85
   query38  4313433642174217
   query39  1551146214371437
   query40  205 117 108 108
   query41  56  54  53  53
   query42  121 109 113 109
   query43  535 536 503 503
   query44  1319833 857 833
   query45  189 175 171 171
   query46  895 1051678 678
   query47  1898190818701870
   query48  387 410 334 334
   query49  728 481 405 405
   query50  650 673 400 400
   query51  4225425842904258
   query52  138 101 90  90
   query53  239 257 196 196
   query54  507 491 424 424
   query55  83  80  77  77
   query56  278 269 256 256
   query57  1187121711661166
   query58  237 233 239 233
   query59  3138319829962996
   query60  277 257 268 257
   query61  116 119 115 115
   query62  746 705 656 656
   query63  217 189 182 182
   query64  12481014678 678
   query65  3352314331663143
   query66  746 388 295 295
   query67  16208   15807   15507   15507
   query68  5020822 525 525
   query69  486 293 264 264
   query70  1224111811431118
   query71  412 277 252 252
   query72  6422391238733873
   query73  782 744 361 361
   query74  9833930286688668
   query75  3320312526952695
   query76  38031176770 770
   query77  480 359 361 359
   query78  10156   10155   93349334
   query79  2892795 603 603
   query80  1700525 445 445
   query81  547 275 237 237
   query82  355 149 132 132
   query83  267 166 146 146
   query84  298 89  71  71
   query85  765 346 347 346
   query86  423 333 304 304
   query87  4401475044154415
   query88  3650216321352135
   query89  394 321 287 287
   query90  1649192 188 188
   query91  135 139 114 114
   query92  67  57  56  56
   query93  2134853 530 530
   query94  756 405 300 300
   query95  317 262 248 248
   query96  486 616 287 287
   query97  2854286327972797
   query98  227 196 200 196
   query99  1290137712611261
   Total cold run time: 310203 ms
   Total hot run time: 191844 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-02 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2630139443

   
   
   TPC-H: Total hot run time: 32399 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 376422f094b5ed32dcc058cd1f75940d1dd30081, 
data reload: false
   
   -- Round 1 --
   q1   17649   552554265426
   q2   2065317 191 191
   q3   10463   1219744 744
   q4   10221   969 561 561
   q5   7602242921642164
   q6   192 170 144 144
   q7   922 779 608 608
   q8   9240137411751175
   q9   5305492249154915
   q10  6845235518941894
   q11  483 271 259 259
   q12  342 359 226 226
   q13  1   365430633063
   q14  229 236 214 214
   q15  517 471 472 471
   q16  631 627 586 586
   q17  568 877 328 328
   q18  7133652364216421
   q19  1949960 546 546
   q20  307 319 194 194
   q21  2793215119561956
   q22  368 333 313 313
   Total cold run time: 103601 ms
   Total hot run time: 32399 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5548546055015460
   q2   243 338 230 230
   q3   2279265823022302
   q4   1437182614001400
   q5   4312472546374637
   q6   166 161 131 131
   q7   2014195918661866
   q8   2627283527072707
   q9   7364727072507250
   q10  3050325527862786
   q11  596 499 496 496
   q12  637 722 559 559
   q13  3476394432573257
   q14  297 294 298 294
   q15  521 468 464 464
   q16  668 695 646 646
   q17  1259175512631263
   q18  7607763373407340
   q19  800 116010891089
   q20  2016205318911891
   q21  5848520248914891
   q22  634 614 590 590
   Total cold run time: 53399 ms
   Total hot run time: 51549 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-02 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2629991731

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-02 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1938767620


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,249 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+static constexpr std::array hex_itoc = {'0', '1', '2', '3', '4', 
'5', '6', '7',

Review Comment:
   please keep constexpr UPPER CASE



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-02 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1938448935


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10 + 2 * compressed_str.size());

Review Comment:
   It may be not rery reasonable. There's reason for Mysql to behave like this.
   1. after compressing, the bytes in corresponding memory just a stream of 
bytes. so any case is possible. just interpret it as chars doesn’t keep 
consistency. consider a memory region of “a\b”. after printing it’s “” because 
‘\b’ deletes ‘a’.
   2. for the compression ratio, it’s guaranteed by compression algorithm. it 
has a very large ratio. so even we print it as chars w

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-01 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2629278644

   TeamCity be ut coverage result:
Function Coverage: 42.08% (10998/26138) 
Line Coverage: 32.37% (92919/287059)
Region Coverage: 31.52% (47635/151146)
Branch Coverage: 27.55% (24114/87544)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/b702390db852ea9772db6d961cd374efc0e1148d_b702390db852ea9772db6d961cd374efc0e1148d/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-01 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2629277567

   
   
   ClickBench: Total hot run time: 30.65 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit b702390db852ea9772db6d961cd374efc0e1148d, 
data reload: false
   
   query1   0.030.030.05
   query2   0.070.040.03
   query3   0.240.070.06
   query4   1.610.100.10
   query5   0.420.420.41
   query6   1.150.650.66
   query7   0.020.020.01
   query8   0.040.030.03
   query9   0.590.510.51
   query10  0.560.570.55
   query11  0.140.100.10
   query12  0.140.100.12
   query13  0.600.590.59
   query14  2.862.882.76
   query15  0.900.840.85
   query16  0.400.380.38
   query17  1.011.001.06
   query18  0.240.200.21
   query19  1.881.781.98
   query20  0.010.010.01
   query21  15.36   0.950.57
   query22  0.741.070.78
   query23  14.93   1.370.52
   query24  2.601.400.77
   query25  0.280.100.14
   query26  0.210.140.14
   query27  0.080.060.04
   query28  14.02   1.060.42
   query29  12.64   3.993.26
   query30  0.250.100.07
   query31  2.840.590.38
   query32  3.230.550.46
   query33  3.083.083.06
   query34  16.61   5.164.57
   query35  4.584.584.58
   query36  0.670.480.50
   query37  0.100.060.06
   query38  0.050.040.03
   query39  0.030.020.02
   query40  0.170.130.12
   query41  0.080.030.03
   query42  0.030.020.03
   query43  0.030.040.03
   Total cold run time: 105.52 s
   Total hot run time: 30.65 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-01 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2629276181

   
   
   TPC-DS: Total hot run time: 190795 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit b702390db852ea9772db6d961cd374efc0e1148d, 
data reload: false
   
   query1   1297975 940 940
   query2   6141202420292024
   query3   11107   473146554655
   query4   32350   23169   22881   22881
   query5   3580611 437 437
   query6   285 198 183 183
   query7   3980486 306 306
   query8   299 244 246 244
   query9   9497261926032603
   query10  459 305 255 255
   query11  17585   15238   14891   14891
   query12  155 109 101 101
   query13  1575523 420 420
   query14  8863644174576441
   query15  242 189 189 189
   query16  8175672 516 516
   query17  1670775 595 595
   query18  2125406 313 313
   query19  211 191 166 166
   query20  121 122 114 114
   query21  206 124 109 109
   query22  4590453845274527
   query23  34342   33421   33389   33389
   query24  6708224523432245
   query25  510 458 401 401
   query26  982 279 150 150
   query27  2362475 320 320
   query28  5385251124162416
   query29  730 566 442 442
   query30  213 188 159 159
   query31  934 867 837 837
   query32  92  61  58  58
   query33  486 357 326 326
   query34  755 883 518 518
   query35  811 817 756 756
   query36  990 1068955 955
   query37  124 106 80  80
   query38  4280433142254225
   query39  1490143714311431
   query40  203 110 102 102
   query41  49  54  50  50
   query42  124 103 102 102
   query43  519 538 507 507
   query44  1353812 805 805
   query45  198 175 170 170
   query46  854 1030639 639
   query47  1902194418451845
   query48  386 411 336 336
   query49  743 486 396 396
   query50  634 659 393 393
   query51  4299426142674261
   query52  101 103 95  95
   query53  223 257 183 183
   query54  505 486 404 404
   query55  82  79  81  79
   query56  263 279 253 253
   query57  1243120411571157
   query58  250 234 238 234
   query59  3105319630593059
   query60  285 266 250 250
   query61  114 119 115 115
   query62  789 742 705 705
   query63  233 196 199 196
   query64  42531028642 642
   query65  3299327732573257
   query66  973 395 302 302
   query67  16048   15696   15286   15286
   query68  4949823 519 519
   query69  470 294 267 267
   query70  1188115411281128
   query71  388 282 278 278
   query72  5837396438123812
   query73  651 753 356 356
   query74  10122   876390058763
   query75  3158313326582658
   query76  31061179760 760
   query77  464 370 281 281
   query78  9951999493719371
   query79  3119797 600 600
   query80  689 526 445 445
   query81  501 277 245 245
   query82  442 157 119 119
   query83  167 174 147 147
   query84  236 94  79  79
   query85  784 348 307 307
   query86  390 336 301 301
   query87  4422461544914491
   query88  4783217221572157
   query89  400 326 293 293
   query90  1843247 189 189
   query91  132 138 107 107
   query92  73  57  51  51
   query93  2388869 540 540
   query94  656 379 286 286
   query95  343 264 259 259
   query96  492 604 297 297
   query97  2870287027522752
   query98  233 202 204 202
   query99  1280137712941294
   Total cold run time: 285264 ms
   Total hot run time: 190795 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-01 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2629272993

   
   
   TPC-H: Total hot run time: 32141 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit b702390db852ea9772db6d961cd374efc0e1148d, 
data reload: false
   
   -- Round 1 --
   q1   17579   551853895389
   q2   2047322 182 182
   q3   10386   1225746 746
   q4   10204   975 542 542
   q5   7539234721342134
   q6   190 168 137 137
   q7   891 752 598 598
   q8   9236135511651165
   q9   5147484149054841
   q10  6875238819231923
   q11  481 282 254 254
   q12  350 368 229 229
   q13  17840   373930953095
   q14  224 220 214 214
   q15  521 474 478 474
   q16  638 616 598 598
   q17  555 854 314 314
   q18  6846628564056285
   q19  1729939 531 531
   q20  317 312 190 190
   q21  2781222619921992
   q22  366 332 308 308
   Total cold run time: 102742 ms
   Total hot run time: 32141 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5643548755165487
   q2   233 327 227 227
   q3   2283263923082308
   q4   1466183313811381
   q5   4278472746274627
   q6   166 156 126 126
   q7   2025202318231823
   q8   2608280326872687
   q9   7280711572057115
   q10  3053327828032803
   q11  581 532 508 508
   q12  654 715 566 566
   q13  3458400433263326
   q14  277 297 270 270
   q15  520 478 473 473
   q16  681 673 630 630
   q17  1216175712581258
   q18  7663749173487348
   q19  804 117510431043
   q20  2035204619521952
   q21  5888528749194919
   q22  650 664 573 573
   Total cold run time: 53462 ms
   Total hot run time: 51450 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-01 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2629263274

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-01 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1938278948


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,249 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+std::array hex_itoc = {'0', '1', '2', '3', '4', '5', '6', '7',

Review Comment:
   HEX_ITOC and constexpr and static



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-02-01 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1938278732


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10 + 2 * compressed_str.size());

Review Comment:
   I don't think you need to change the compressed bytes into a visible 
hexadecimal string.
   1. the work maybe the result bigger than before compress 
   2. nobody care about the content of compressed bytes, people only care the 
compress really compress the data and decompress can get the same result before 
compress



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above 

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2623424626

   TeamCity be ut coverage result:
Function Coverage: 42.07% (10997/26138) 
Line Coverage: 32.36% (92890/287059)
Region Coverage: 31.51% (47633/151146)
Branch Coverage: 27.54% (24107/87544)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/23e089b95f2a690fe4e2f913b1ec7550fceabdd3_23e089b95f2a690fe4e2f913b1ec7550fceabdd3/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2623419535

   
   
   ClickBench: Total hot run time: 30.96 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, 
data reload: false
   
   query1   0.040.040.03
   query2   0.070.030.04
   query3   0.240.060.07
   query4   1.610.110.10
   query5   0.430.410.40
   query6   1.140.650.65
   query7   0.020.020.02
   query8   0.040.030.03
   query9   0.580.500.51
   query10  0.570.550.56
   query11  0.140.100.12
   query12  0.140.110.11
   query13  0.620.610.60
   query14  2.842.862.90
   query15  0.900.820.83
   query16  0.370.390.38
   query17  1.041.061.09
   query18  0.220.220.22
   query19  1.861.891.99
   query20  0.020.010.01
   query21  15.35   0.880.59
   query22  0.760.860.65
   query23  15.23   1.470.62
   query24  3.010.991.16
   query25  0.140.140.10
   query26  0.370.170.14
   query27  0.050.060.05
   query28  13.36   1.020.42
   query29  12.65   3.973.32
   query30  0.250.100.06
   query31  2.820.600.38
   query32  3.220.560.46
   query33  3.003.023.04
   query34  16.52   5.124.50
   query35  4.514.454.50
   query36  0.640.520.49
   query37  0.100.060.06
   query38  0.050.040.04
   query39  0.030.020.03
   query40  0.170.120.12
   query41  0.070.030.02
   query42  0.030.020.02
   query43  0.040.020.02
   Total cold run time: 105.26 s
   Total hot run time: 30.96 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2623414208

   
   
   TPC-DS: Total hot run time: 190787 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, 
data reload: false
   
   query1   1313946 942 942
   query2   6103210720372037
   query3   10970   439345004393
   query4   60657   29230   23228   23228
   query5   5571599 463 463
   query6   440 203 175 175
   query7   5543507 293 293
   query8   330 240 229 229
   query9   8401265926342634
   query10  462 303 252 252
   query11  17194   14930   15369   14930
   query12  157 112 109 109
   query13  1413560 440 440
   query14  10384   711264686468
   query15  216 214 199 199
   query16  7329650 481 481
   query17  1131745 618 618
   query18  1899419 322 322
   query19  228 196 164 164
   query20  121 118 117 117
   query21  215 125 106 106
   query22  4377466645994599
   query23  34466   33134   33559   33134
   query24  5719230423652304
   query25  465 462 389 389
   query26  644 278 154 154
   query27  1661457 334 334
   query28  4026249024332433
   query29  527 574 429 429
   query30  210 194 154 154
   query31  942 909 810 810
   query32  75  54  57  54
   query33  457 364 307 307
   query34  747 873 511 511
   query35  806 831 736 736
   query36  10211010948 948
   query37  120 109 73  73
   query38  4384434943104310
   query39  1487144314471443
   query40  205 114 102 102
   query41  52  50  56  50
   query42  125 109 102 102
   query43  526 543 507 507
   query44  1374850 818 818
   query45  185 179 166 166
   query46  870 1071658 658
   query47  1908187518461846
   query48  389 425 326 326
   query49  743 483 381 381
   query50  649 654 392 392
   query51  4334425942514251
   query52  114 99  94  94
   query53  225 252 187 187
   query54  502 506 432 432
   query55  87  83  79  79
   query56  246 271 245 245
   query57  1200120111411141
   query58  240 227 237 227
   query59  3147320331313131
   query60  302 268 259 259
   query61  113 125 115 115
   query62  748 741 661 661
   query63  221 183 184 183
   query64  12801000645 645
   query65  3228316331883163
   query66  722 395 291 291
   query67  15918   15522   15416   15416
   query68  3911813 572 572
   query69  480 309 258 258
   query70  1162115411491149
   query71  411 288 255 255
   query72  5928403538213821
   query73  658 767 370 370
   query74  9969894987458745
   query75  3243313326422642
   query76  30651178767 767
   query77  485 362 275 275
   query78  997410069   93629362
   query79  2676797 592 592
   query80  1623533 441 441
   query81  555 275 237 237
   query82  352 147 116 116
   query83  267 166 147 147
   query84  291 94  79  79
   query85  768 341 303 303
   query86  414 312 307 307
   query87  4506449644564456
   query88  3689217022092170
   query89  393 324 284 284
   query90  1583185 189 185
   query91  131 141 104 104
   query92  59  58  55  55
   query93  2363897 543 543
   query94  695 398 307 307
   query95  328 264 310 264
   query96  488 611 280 280
   query97  2864288127002700
   query98  224 212 198 198
   query99  1276140112131213
   Total cold run time: 306695 ms
   Total hot run time: 190787 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2623402130

   
   
   TPC-H: Total hot run time: 32303 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 23e089b95f2a690fe4e2f913b1ec7550fceabdd3, 
data reload: false
   
   -- Round 1 --
   q1   17578   549653735373
   q2   2047308 164 164
   q3   10425   1238769 769
   q4   10214   991 547 547
   q5   7944241621602160
   q6   201 171 132 132
   q7   904 762 596 596
   q8   9228137111731173
   q9   5320506548904890
   q10  6842234219021902
   q11  458 277 250 250
   q12  341 359 217 217
   q13  17750   369731183118
   q14  240 239 219 219
   q15  519 479 474 474
   q16  641 621 584 584
   q17  584 869 333 333
   q18  6978633665006336
   q19  1875974 556 556
   q20  317 322 195 195
   q21  2850229520052005
   q22  366 340 310 310
   Total cold run time: 103622 ms
   Total hot run time: 32303 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5621552754895489
   q2   251 326 238 238
   q3   2247270923512351
   q4   1376185513731373
   q5   4350479248714792
   q6   182 164 128 128
   q7   2078197218471847
   q8   2620287427572757
   q9   7344722073017220
   q10  3036327928072807
   q11  574 505 488 488
   q12  632 723 607 607
   q13  3795391433823382
   q14  288 294 274 274
   q15  519 491 454 454
   q16  657 704 648 648
   q17  1257175012551255
   q18  7770768173877387
   q19  831 121410761076
   q20  2000204118931893
   q21  5802525951025102
   q22  633 597 568 568
   Total cold run time: 53863 ms
   Total hot run time: 52136 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2623345894

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


lzyy2024 commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1934155126


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10 + 2 * compressed_str.size());

Review Comment:
   Yes, mysql does the same thing. What I do is stream the compressed bytes 
into a visible hexadecimal string
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1934135609


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10 + 2 * compressed_str.size());

Review Comment:
   so what the function do ? seems maybe the result bigger than before compress 
?  Mysql do the same thing ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1934135609


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10 + 2 * compressed_str.size());

Review Comment:
   so what the function do ? seems maybe the result bigger than before compress 
? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


--

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1934130870


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''

Review Comment:
   better do the check before call `compress`



##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  S

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-29 Thread via GitHub


HappenLee commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1934123275


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,247 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";

Review Comment:
   HEX_ITOC for const data. need constexpr, better be std::array



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1930829499


##
regression-test/suites/query_p0/sql_functions/string_functions/test_compress_uncompress.groovy:
##
@@ -0,0 +1,83 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_compress_uncompress") {
+// Drop the existing table
+sql "DROP TABLE IF EXISTS test_compression"
+
+// Create the test table
+sql """
+CREATE TABLE test_compression (
+k0 INT,  -- Primary key
+text_col STRING, -- String column for input data
+binary_col STRING-- Binary column for compressed data
+)
+DISTRIBUTED BY HASH(k0)
+PROPERTIES (
+"replication_num" = "1"
+);
+"""
+
+// Insert test data with various cases (removing special characters)
+sql """
+INSERT INTO test_compression VALUES
+(1, 'Hello, world!', NULL),-- Plain string

Review Comment:
   the `binary_col` should contains some valid value to uncompress



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


github-actions[bot] commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2616316739

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615833260

   TeamCity be ut coverage result:
Function Coverage: 42.08% (10980/26093) 
Line Coverage: 32.35% (92834/286927)
Region Coverage: 31.50% (47593/151083)
Branch Coverage: 27.54% (24108/87524)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/72fafec758b44f421c4d53d99040500d328b9f5a_72fafec758b44f421c4d53d99040500d328b9f5a/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615822511

   
   
   ClickBench: Total hot run time: 29.92 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 72fafec758b44f421c4d53d99040500d328b9f5a, 
data reload: false
   
   query1   0.030.030.06
   query2   0.080.030.03
   query3   0.240.070.06
   query4   1.620.110.10
   query5   0.400.420.42
   query6   1.170.660.66
   query7   0.020.020.01
   query8   0.040.030.04
   query9   0.580.520.51
   query10  0.560.560.55
   query11  0.150.100.10
   query12  0.140.110.11
   query13  0.610.590.60
   query14  2.842.812.72
   query15  0.900.820.82
   query16  0.370.390.37
   query17  1.061.031.05
   query18  0.230.210.21
   query19  1.921.802.02
   query20  0.010.020.01
   query21  15.36   0.900.57
   query22  0.760.720.60
   query23  15.43   1.410.49
   query24  3.261.000.55
   query25  0.120.300.19
   query26  0.330.140.14
   query27  0.050.060.07
   query28  13.69   1.070.42
   query29  12.64   3.913.22
   query30  0.250.090.06
   query31  2.820.590.37
   query32  3.220.530.45
   query33  3.012.943.02
   query34  16.50   5.134.43
   query35  4.584.514.49
   query36  0.650.480.48
   query37  0.090.060.06
   query38  0.040.040.03
   query39  0.030.020.03
   query40  0.170.150.12
   query41  0.080.030.02
   query42  0.040.020.02
   query43  0.030.030.03
   Total cold run time: 106.12 s
   Total hot run time: 29.92 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615810154

   
   
   TPC-DS: Total hot run time: 190970 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 72fafec758b44f421c4d53d99040500d328b9f5a, 
data reload: false
   
   query1   1313947 924 924
   query2   6408197519941975
   query3   10961   444843184318
   query4   60965   28798   23186   23186
   query5   5553610 461 461
   query6   417 175 189 175
   query7   5502500 291 291
   query8   313 229 222 222
   query9   8521267926792679
   query10  440 300 254 254
   query11  17834   15025   15430   15025
   query12  169 108 109 108
   query13  1447551 434 434
   query14  10443   695264396439
   query15  214 187 188 187
   query16  7230627 477 477
   query17  1137752 565 565
   query18  1849425 309 309
   query19  198 181 155 155
   query20  113 113 108 108
   query21  215 121 106 106
   query22  4591461642774277
   query23  34677   32988   33312   32988
   query24  6146241023192319
   query25  471 484 414 414
   query26  648 249 172 172
   query27  1765467 328 328
   query28  4275251624702470
   query29  531 542 441 441
   query30  211 195 163 163
   query31  931 890 820 820
   query32  67  57  56  56
   query33  441 342 308 308
   query34  737 862 515 515
   query35  787 877 784 784
   query36  997 1051986 986
   query37  119 98  83  83
   query38  4313429443254294
   query39  1484146814381438
   query40  201 115 100 100
   query41  55  53  50  50
   query42  133 110 104 104
   query43  521 533 490 490
   query44  1297847 843 843
   query45  180 176 167 167
   query46  873 1074657 657
   query47  1915188918471847
   query48  394 387 323 323
   query49  709 505 424 424
   query50  642 678 408 408
   query51  4259434742644264
   query52  107 106 98  98
   query53  237 255 191 191
   query54  479 488 430 430
   query55  83  79  81  79
   query56  251 265 242 242
   query57  1123118510871087
   query58  256 244 236 236
   query59  3095333930593059
   query60  280 273 257 257
   query61  120 113 131 113
   query62  735 711 670 670
   query63  220 189 186 186
   query64  12541033644 644
   query65  3296315731613157
   query66  719 404 313 313
   query67  15951   15728   15644   15644
   query68  2710796 589 589
   query69  432 359 263 263
   query70  1230115111261126
   query71  341 281 249 249
   query72  5026386138273827
   query73  619 747 457 457
   query74  9566915091129112
   query75  3156313426732673
   query76  19111175787 787
   query77  343 362 270 270
   query78  10106   10219   93379337
   query79  1242895 597 597
   query80  1113521 434 434
   query81  527 270 252 252
   query82  1178153 125 125
   query83  239 164 157 157
   query84  283 97  72  72
   query85  776 344 300 300
   query86  340 329 262 262
   query87  4450455043804380
   query88  3551218921632163
   query89  394 335 283 283
   query90  1738187 190 187
   query91  133 134 106 106
   query92  60  57  52  52
   query93  1101846 535 535
   query94  617 377 303 303
   query95  329 263 254 254
   query96  485 626 284 284
   query97  2834291427412741
   query98  214 192 192 192
   query99  1295136212641264
   Total cold run time: 302157 ms
   Total hot run time: 190970 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615781090

   
   
   TPC-H: Total hot run time: 32426 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 72fafec758b44f421c4d53d99040500d328b9f5a, 
data reload: false
   
   -- Round 1 --
   q1   17768   551653515351
   q2   2036301 169 169
   q3   10431   1225721 721
   q4   10478   985 525 525
   q5   8112239421582158
   q6   192 167 135 135
   q7   909 779 609 609
   q8   9229132811681168
   q9   5330488349084883
   q10  6822234219041904
   q11  478 269 250 250
   q12  345 353 223 223
   q13  18022   372231053105
   q14  230 246 215 215
   q15  529 491 478 478
   q16  637 610 593 593
   q17  550 858 325 325
   q18  6894652168356521
   q19  43221342537 537
   q20  309 319 194 194
   q21  2778226320422042
   q22  372 351 320 320
   Total cold run time: 106773 ms
   Total hot run time: 32426 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5731549955565499
   q2   246 337 251 251
   q3   2460284623952395
   q4   1404181614671467
   q5   4431488549974885
   q6   170 166 130 130
   q7   2152208418701870
   q8   2680287626912691
   q9   7203719372467193
   q10  2976315027212721
   q11  584 513 518 513
   q12  699 767 614 614
   q13  3498387632593259
   q14  284 300 277 277
   q15  519 471 482 471
   q16  648 689 646 646
   q17  1236170212651265
   q18  7753734972817281
   q19  795 114810771077
   q20  2031204319181918
   q21  5591506450215021
   q22  600 618 565 565
   Total cold run time: 53691 ms
   Total hot run time: 52009 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615692710

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615454377

   
   
   ClickBench: Total hot run time: 30.47 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 99d8ddaef7e3fd486ef1ccd258f1225713513411, 
data reload: false
   
   query1   0.040.040.03
   query2   0.070.040.03
   query3   0.240.070.07
   query4   1.610.110.10
   query5   0.430.420.41
   query6   1.160.660.67
   query7   0.020.020.02
   query8   0.040.030.03
   query9   0.570.500.50
   query10  0.550.560.54
   query11  0.130.110.09
   query12  0.140.100.11
   query13  0.620.610.61
   query14  2.682.722.74
   query15  0.900.840.84
   query16  0.360.380.39
   query17  1.051.061.06
   query18  0.240.210.21
   query19  1.901.872.02
   query20  0.010.010.01
   query21  15.36   0.930.57
   query22  0.750.830.63
   query23  15.25   1.440.60
   query24  3.040.642.03
   query25  0.230.130.14
   query26  0.240.150.15
   query27  0.050.040.05
   query28  14.04   1.050.44
   query29  12.60   3.913.27
   query30  0.270.090.07
   query31  2.840.610.39
   query32  3.220.550.46
   query33  2.962.963.12
   query34  16.72   5.164.50
   query35  4.614.624.54
   query36  0.670.480.48
   query37  0.100.060.06
   query38  0.050.040.04
   query39  0.040.020.02
   query40  0.160.140.13
   query41  0.070.020.02
   query42  0.030.020.02
   query43  0.030.030.04
   Total cold run time: 106.09 s
   Total hot run time: 30.47 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615442577

   
   
   TPC-DS: Total hot run time: 192137 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 99d8ddaef7e3fd486ef1ccd258f1225713513411, 
data reload: false
   
   query1   1330946 914 914
   query2   6165202520412025
   query3   11109   453746144537
   query4   61620   27953   23613   23613
   query5   5596596 451 451
   query6   428 196 183 183
   query7   5506510 306 306
   query8   335 251 246 246
   query9   8056270326832683
   query10  439 298 260 260
   query11  17350   15120   15753   15120
   query12  160 108 104 104
   query13  1442588 453 453
   query14  10910   726871887188
   query15  203 202 194 194
   query16  6594625 491 491
   query17  1079732 550 550
   query18  1149387 293 293
   query19  196 183 161 161
   query20  122 117 110 110
   query21  214 128 106 106
   query22  4634487543844384
   query23  34182   33362   33871   33362
   query24  5613228723832287
   query25  470 456 381 381
   query26  642 226 157 157
   query27  1792487 335 335
   query28  4293252524622462
   query29  549 523 444 444
   query30  213 188 158 158
   query31  922 908 830 830
   query32  73  59  53  53
   query33  419 351 310 310
   query34  747 854 532 532
   query35  804 858 759 759
   query36  10201089973 973
   query37  122 98  79  79
   query38  4330428042764276
   query39  1497143814721438
   query40  203 124 103 103
   query41  50  56  47  47
   query42  117 114 101 101
   query43  524 524 496 496
   query44  1279808 840 808
   query45  184 168 167 167
   query46  911 1063661 661
   query47  1960190518641864
   query48  413 417 339 339
   query49  704 495 398 398
   query50  652 666 393 393
   query51  4357429943104299
   query52  114 104 99  99
   query53  233 261 200 200
   query54  497 515 405 405
   query55  81  80  86  80
   query56  253 273 242 242
   query57  1145121911471147
   query58  242 227 233 227
   query59  3045327729822982
   query60  275 256 249 249
   query61  118 110 110 110
   query62  714 727 653 653
   query63  218 180 190 180
   query64  12991037639 639
   query65  3237311631513116
   query66  721 393 289 289
   query67  16002   15718   15433   15433
   query68  2886870 571 571
   query69  455 312 265 265
   query70  1225115011391139
   query71  384 280 257 257
   query72  6248394238313831
   query73  649 740 356 356
   query74  9996880689258806
   query75  3152314826572657
   query76  31601182776 776
   query77  478 353 269 269
   query78  10191   10028   93529352
   query79  2912827 589 589
   query80  1681519 432 432
   query81  533 279 234 234
   query82  372 148 117 117
   query83  255 166 173 166
   query84  291 95  74  74
   query85  796 344 316 316
   query86  406 322 262 262
   query87  4425448744024402
   query88  3757219721592159
   query89  388 320 290 290
   query90  1708192 193 192
   query91  177 136 105 105
   query92  62  60  50  50
   query93  2463871 537 537
   query94  747 403 299 299
   query95  333 260 255 255
   query96  482 630 278 278
   query97  2805288627302730
   query98  218 197 192 192
   query99  1255137912441244
   Total cold run time: 306763 ms
   Total hot run time: 192137 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615415100

   
   
   TPC-H: Total hot run time: 32338 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 99d8ddaef7e3fd486ef1ccd258f1225713513411, 
data reload: false
   
   -- Round 1 --
   q1   17593   551753825382
   q2   2047316 166 166
   q3   10413   1269765 765
   q4   10201   961 532 532
   q5   7529238121642164
   q6   194 170 136 136
   q7   898 763 622 622
   q8   9239137912411241
   q9   5255491349924913
   q10  6867234218501850
   q11  464 274 257 257
   q12  353 367 226 226
   q13  17771   370031223122
   q14  233 236 209 209
   q15  530 487 484 484
   q16  624 618 577 577
   q17  571 875 332 332
   q18  6986636463016301
   q19  1983955 550 550
   q20  314 330 193 193
   q21  3029222419931993
   q22  366 351 323 323
   Total cold run time: 103460 ms
   Total hot run time: 32338 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5573551254755475
   q2   232 332 238 238
   q3   2201264023332333
   q4   1438179513191319
   q5   4299469746884688
   q6   166 159 125 125
   q7   2041194518041804
   q8   2606285226932693
   q9   7270713371857133
   q10  3027328928132813
   q11  581 519 486 486
   q12  673 736 602 602
   q13  3555392233743374
   q14  296 292 281 281
   q15  518 491 462 462
   q16  674 687 641 641
   q17  1255176012421242
   q18  7669737774397377
   q19  822 116810871087
   q20  2003202418651865
   q21  5813530848564856
   q22  602 596 600 596
   Total cold run time: 53314 ms
   Total hot run time: 51490 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2615331930

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-27 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1930088671


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,251 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+
+// When the original string is large, the result is roughly this value
+size_t total = arg_offset[input_rows_count - 1];
+col_data.reserve(total / 1000);
+
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin + arg_offset[row - 1], length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (size_t i = 0; i < 4; i++) {
+unsigned char byte = (length >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four
+col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F];
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * compressed_str.size());
+
+unsigned char* src = compressed_str.data();
+  

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614448303

   TeamCity be ut coverage result:
Function Coverage: 42.08% (10979/26093) 
Line Coverage: 32.35% (92831/286929)
Region Coverage: 31.50% (47590/151083)
Branch Coverage: 27.54% (24105/87524)
Coverage Report: 
http://coverage.selectdb-in.cc/coverage/9263dea5e49d60ca40fe41a6ec858405ae8202f9_9263dea5e49d60ca40fe41a6ec858405ae8202f9/report/index.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614440849

   
   
   ClickBench: Total hot run time: 31.04 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, 
data reload: false
   
   query1   0.060.030.03
   query2   0.070.030.03
   query3   0.240.070.07
   query4   1.610.100.10
   query5   0.410.420.39
   query6   1.160.660.67
   query7   0.020.020.01
   query8   0.040.030.03
   query9   0.580.510.52
   query10  0.540.570.55
   query11  0.140.110.10
   query12  0.140.110.11
   query13  0.610.610.60
   query14  2.842.752.81
   query15  0.890.830.82
   query16  0.360.370.37
   query17  1.051.011.00
   query18  0.220.210.22
   query19  1.992.051.88
   query20  0.010.010.01
   query21  15.36   0.940.59
   query22  0.740.900.62
   query23  15.20   1.490.59
   query24  3.291.331.69
   query25  0.150.170.10
   query26  0.290.150.14
   query27  0.060.050.04
   query28  14.06   1.060.43
   query29  12.53   4.073.24
   query30  0.250.080.06
   query31  2.830.620.39
   query32  3.230.550.46
   query33  2.953.033.03
   query34  16.53   5.264.52
   query35  4.464.484.50
   query36  0.650.510.47
   query37  0.100.060.06
   query38  0.050.040.04
   query39  0.030.020.02
   query40  0.160.130.13
   query41  0.080.020.03
   query42  0.030.030.02
   query43  0.040.030.03
   Total cold run time: 106.05 s
   Total hot run time: 31.04 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614438692

   
   
   TPC-DS: Total hot run time: 184635 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, 
data reload: false
   
   query1   972 376 371 371
   query2   6509204119881988
   query3   6789219 218 218
   query4   36458   23342   22916   22916
   query5   4386602 450 450
   query6   290 198 188 188
   query7   4604484 306 306
   query8   298 246 228 228
   query9   9471269926922692
   query10  465 306 254 254
   query11  17964   15196   14913   14913
   query12  160 109 102 102
   query13  1642523 385 385
   query14  9587689572366895
   query15  234 191 183 183
   query16  7785609 471 471
   query17  1575702 540 540
   query18  2009393 306 306
   query19  233 191 159 159
   query20  119 119 118 118
   query21  207 123 104 104
   query22  4130442740924092
   query23  34663   32920   32906   32906
   query24  6636229122672267
   query25  481 480 422 422
   query26  1040279 158 158
   query27  1985472 345 345
   query28  5057249224522452
   query29  635 595 448 448
   query30  235 190 156 156
   query31  979 855 811 811
   query32  70  62  62  62
   query33  530 374 313 313
   query34  750 845 497 497
   query35  848 872 749 749
   query36  964 1009955 955
   query37  126 97  89  89
   query38  4123412540754075
   query39  1425138913791379
   query40  203 121 104 104
   query41  52  55  50  50
   query42  119 102 107 102
   query43  504 518 476 476
   query44  1296797 803 797
   query45  181 168 166 166
   query46  862 1027651 651
   query47  1818186617781778
   query48  393 410 314 314
   query49  778 494 390 390
   query50  666 654 409 409
   query51  4171415341294129
   query52  105 104 90  90
   query53  223 252 185 185
   query54  486 494 405 405
   query55  82  80  85  80
   query56  255 270 235 235
   query57  1157116310911091
   query58  254 235 246 235
   query59  3153321728802880
   query60  276 280 260 260
   query61  117 120 121 120
   query62  832 726 644 644
   query63  225 191 210 191
   query64  35891020677 677
   query65  3227315531613155
   query66  934 416 315 315
   query67  15861   15984   15412   15412
   query68  4284842 528 528
   query69  464 291 259 259
   query70  1208113511491135
   query71  372 280 258 258
   query72  5864384139773841
   query73  654 750 371 371
   query74  10124   892789958927
   query75  3185314926212621
   query76  32171160775 775
   query77  470 370 274 274
   query78  10087   10007   94979497
   query79  3152814 585 585
   query80  1499530 438 438
   query81  568 281 240 240
   query82  715 152 129 129
   query83  183 170 155 155
   query84  238 100 73  73
   query85  800 388 300 300
   query86  427 300 297 297
   query87  4543448042754275
   query88  5047218321632163
   query89  402 331 297 297
   query90  1786191 186 186
   query91  138 135 111 111
   query92  66  58  54  54
   query93  2742895 537 537
   query94  740 409 297 297
   query95  330 261 260 260
   query96  488 612 279 279
   query97  2758284927152715
   query98  238 205 196 196
   query99  1327137812581258
   Total cold run time: 286369 ms
   Total hot run time: 184635 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614434365

   
   
   TPC-H: Total hot run time: 32060 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, 
data reload: false
   
   -- Round 1 --
   q1   17599   551353275327
   q2   2048315 176 176
   q3   10559   1307702 702
   q4   10239   974 529 529
   q5   8071234321192119
   q6   197 165 132 132
   q7   893 758 619 619
   q8   9225134011851185
   q9   5100482448514824
   q10  6800233318851885
   q11  458 275 267 267
   q12  347 356 218 218
   q13  17763   368131403140
   q14  221 225 206 206
   q15  506 462 465 462
   q16  635 609 595 595
   q17  549 850 325 325
   q18  7160633663566336
   q19  2894975 532 532
   q20  304 322 190 190
   q21  2803221619791979
   q22  372 334 312 312
   Total cold run time: 104743 ms
   Total hot run time: 32060 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5588545254895452
   q2   239 330 235 235
   q3   2275268723322332
   q4   1444182813481348
   q5   4342469646604660
   q6   180 164 129 129
   q7   2107195718831883
   q8   2656282426722672
   q9   7277717471717171
   q10  2963324627692769
   q11  583 532 494 494
   q12  690 779 682 682
   q13  3493389533263326
   q14  283 293 274 274
   q15  518 473 459 459
   q16  640 669 620 620
   q17  1205171312661266
   q18  7576731673267316
   q19  782 113110331033
   q20  2051203518711871
   q21  5615529051185118
   q22  628 610 600 600
   Total cold run time: 53135 ms
   Total hot run time: 51710 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614420034

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614411403

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614377338

   
   
   ClickBench: Total hot run time: 30.3 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 13ebe672a083689491c631868d403d84b840cd3f, 
data reload: false
   
   query1   0.030.030.04
   query2   0.060.040.03
   query3   0.240.060.07
   query4   1.610.110.10
   query5   0.430.440.41
   query6   1.160.650.65
   query7   0.020.010.01
   query8   0.040.030.04
   query9   0.590.490.51
   query10  0.550.580.56
   query11  0.140.110.11
   query12  0.130.100.11
   query13  0.630.600.60
   query14  2.732.892.85
   query15  0.890.840.82
   query16  0.390.390.38
   query17  1.011.031.00
   query18  0.220.200.20
   query19  1.861.772.09
   query20  0.020.010.01
   query21  15.37   0.940.56
   query22  0.750.860.77
   query23  15.15   1.480.60
   query24  2.941.710.36
   query25  0.280.090.13
   query26  0.340.140.13
   query27  0.050.070.06
   query28  13.57   1.030.44
   query29  12.58   3.963.27
   query30  0.250.090.07
   query31  2.820.600.37
   query32  3.250.550.46
   query33  3.013.013.05
   query34  16.61   5.274.56
   query35  4.514.544.52
   query36  0.650.490.48
   query37  0.100.060.06
   query38  0.040.040.04
   query39  0.030.020.03
   query40  0.170.140.14
   query41  0.090.030.03
   query42  0.040.020.03
   query43  0.040.040.03
   Total cold run time: 105.39 s
   Total hot run time: 30.3 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614363270

   
   
   TPC-DS: Total hot run time: 185720 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, 
data reload: false
   
   query1   979 388 367 367
   query2   6520206920022002
   query3   6799218 219 218
   query4   33222   23366   23107   23107
   query5   4314612 458 458
   query6   285 210 187 187
   query7   4590486 314 314
   query8   302 245 229 229
   query9   9612270327042703
   query10  466 306 249 249
   query11  17939   15268   15179   15179
   query12  157 103 102 102
   query13  1670539 391 391
   query14  10394   701269846984
   query15  230 190 186 186
   query16  7215619 477 477
   query17  1595701 574 574
   query18  1740393 312 312
   query19  237 190 171 171
   query20  122 117 111 111
   query21  213 123 103 103
   query22  4125442443094309
   query23  34421   33045   33169   33045
   query24  6612229523922295
   query25  506 475 398 398
   query26  1221277 158 158
   query27  1968461 347 347
   query28  5186246824512451
   query29  607 571 453 453
   query30  232 186 168 168
   query31  964 891 814 814
   query32  73  64  62  62
   query33  524 419 307 307
   query34  741 838 519 519
   query35  794 804 762 762
   query36  10221063941 941
   query37  121 105 80  80
   query38  4089416340044004
   query39  1492138014541380
   query40  205 111 103 103
   query41  55  52  63  52
   query42  120 103 109 103
   query43  517 506 484 484
   query44  1373814 816 814
   query45  177 170 163 163
   query46  859 1031647 647
   query47  1779183517911791
   query48  388 401 327 327
   query49  758 478 397 397
   query50  633 670 392 392
   query51  4188421241414141
   query52  101 106 93  93
   query53  236 251 196 196
   query54  488 496 404 404
   query55  83  77  79  77
   query56  263 267 242 242
   query57  1151116810731073
   query58  241 227 246 227
   query59  3010299527552755
   query60  277 265 251 251
   query61  117 109 113 109
   query62  792 720 664 664
   query63  217 192 194 192
   query64  40761017637 637
   query65  3245320531433143
   query66  906 414 311 311
   query67  15870   15809   15640   15640
   query68  5346836 541 541
   query69  443 289 253 253
   query70  1195116410831083
   query71  387 282 260 260
   query72  5798382237763776
   query73  655 760 363 363
   query74  9923894592498945
   query75  3187312926562656
   query76  31991183785 785
   query77  481 367 283 283
   query78  10003   10019   93459345
   query79  3024829 604 604
   query80  682 529 446 446
   query81  498 277 282 277
   query82  423 155 124 124
   query83  165 174 153 153
   query84  239 89  76  76
   query85  787 337 301 301
   query86  390 323 305 305
   query87  4520442344394423
   query88  5058217721572157
   query89  386 327 294 294
   query90  1809192 206 192
   query91  131 133 105 105
   query92  62  59  56  56
   query93  2315876 541 541
   query94  664 421 308 308
   query95  334 278 262 262
   query96  491 619 290 290
   query97  2761287527252725
   query98  229 205 195 195
   query99  1287139412511251
   Total cold run time: 282296 ms
   Total hot run time: 185720 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614350487

   
   
   TPC-H: Total hot run time: 32328 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, 
data reload: false
   
   -- Round 1 --
   q1   17587   552054005400
   q2   2046311 168 168
   q3   10541   1284722 722
   q4   10240   962 540 540
   q5   8273248221822182
   q6   195 165 135 135
   q7   904 774 641 641
   q8   9245136611781178
   q9   5286487049294870
   q10  6871235318791879
   q11  456 280 259 259
   q12  352 358 216 216
   q13  17765   371331093109
   q14  232 240 206 206
   q15  536 483 459 459
   q16  634 616 600 600
   q17  567 876 320 320
   q18  7111638663976386
   q19  1677953 548 548
   q20  312 323 190 190
   q21  2862225320052005
   q22  364 331 315 315
   Total cold run time: 104056 ms
   Total hot run time: 32328 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5710550755105507
   q2   237 324 254 254
   q3   2251260022672267
   q4   1411180113611361
   q5   4357473046504650
   q6   173 163 129 129
   q7   2075196518921892
   q8   2583282926892689
   q9   7428714372267143
   q10  3027334528152815
   q11  592 521 496 496
   q12  672 778 609 609
   q13  3498391334033403
   q14  290 305 283 283
   q15  524 479 464 464
   q16  631 694 636 636
   q17  1240172412621262
   q18  7669755673627362
   q19  761 106711281067
   q20  1975207218981898
   q21  5701529551315131
   q22  592 572 577 572
   Total cold run time: 53397 ms
   Total hot run time: 51890 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-26 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614340331

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614252615

   
   
   ClickBench: Total hot run time: 30.67 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, 
data reload: false
   
   query1   0.040.040.03
   query2   0.070.030.03
   query3   0.250.060.07
   query4   1.610.110.10
   query5   0.420.420.40
   query6   1.170.650.66
   query7   0.020.020.02
   query8   0.040.040.03
   query9   0.580.490.50
   query10  0.560.560.54
   query11  0.150.100.10
   query12  0.140.110.11
   query13  0.600.600.61
   query14  2.852.742.72
   query15  0.900.830.82
   query16  0.390.380.36
   query17  1.051.011.00
   query18  0.240.200.20
   query19  1.861.882.01
   query20  0.010.010.01
   query21  15.36   0.990.58
   query22  0.770.820.75
   query23  15.21   1.490.53
   query24  3.250.920.84
   query25  0.170.260.12
   query26  0.180.150.14
   query27  0.050.040.04
   query28  13.60   1.090.44
   query29  12.60   3.983.33
   query30  0.260.080.06
   query31  2.840.620.39
   query32  3.230.540.46
   query33  2.973.062.99
   query34  16.60   5.174.52
   query35  4.614.634.54
   query36  0.650.480.50
   query37  0.090.060.05
   query38  0.050.040.04
   query39  0.030.020.03
   query40  0.160.130.14
   query41  0.080.030.03
   query42  0.040.030.02
   query43  0.040.030.02
   Total cold run time: 105.79 s
   Total hot run time: 30.67 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614251279

   
   
   TPC-DS: Total hot run time: 191631 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, 
data reload: false
   
   query1   1306964 932 932
   query2   6184203820332033
   query3   11103   470243994399
   query4   61069   29129   23025   23025
   query5   5535611 458 458
   query6   432 204 183 183
   query7   5529511 307 307
   query8   331 247 233 233
   query9   8032270827012701
   query10  469 305 259 259
   query11  17709   15224   15513   15224
   query12  168 122 114 114
   query13  1465546 409 409
   query14  11082   704069946994
   query15  210 206 197 197
   query16  7241636 484 484
   query17  1201730 591 591
   query18  1910422 335 335
   query19  205 194 165 165
   query20  123 114 118 114
   query21  225 131 106 106
   query22  4449447045434470
   query23  34433   33834   33260   33260
   query24  5996236722972297
   query25  460 467 404 404
   query26  649 279 157 157
   query27  1809459 333 333
   query28  4055248924562456
   query29  525 545 431 431
   query30  214 192 158 158
   query31  929 915 837 837
   query32  64  60  57  57
   query33  438 366 306 306
   query34  742 872 503 503
   query35  816 867 758 758
   query36  10331051950 950
   query37  115 107 78  78
   query38  4310436242654265
   query39  1508144814421442
   query40  217 113 103 103
   query41  51  51  50  50
   query42  124 109 102 102
   query43  507 516 494 494
   query44  1338846 857 846
   query45  183 173 171 171
   query46  873 1054654 654
   query47  1891197918741874
   query48  396 407 342 342
   query49  718 493 409 409
   query50  649 707 400 400
   query51  4265431341724172
   query52  111 105 99  99
   query53  228 254 200 200
   query54  485 513 426 426
   query55  81  82  82  82
   query56  260 266 244 244
   query57  1237121011541154
   query58  233 231 236 231
   query59  3223336430533053
   query60  279 271 266 266
   query61  139 112 117 112
   query62  736 720 663 663
   query63  225 184 185 184
   query64  12861034656 656
   query65  3273312431423124
   query66  689 435 332 332
   query67  16065   15658   15451   15451
   query68  5022809 539 539
   query69  475 295 264 264
   query70  1178116111261126
   query71  416 286 253 253
   query72  6050389937973797
   query73  803 764 353 353
   query74  9860879286988698
   query75  3220315627032703
   query76  37961195748 748
   query77  536 353 275 275
   query78  10087   10047   93459345
   query79  2453805 603 603
   query80  1199524 485 485
   query81  540 279 227 227
   query82  355 160 126 126
   query83  242 165 159 159
   query84  291 92  70  70
   query85  746 342 301 301
   query86  377 321 301 301
   query87  4546447644824476
   query88  3486217421362136
   query89  393 332 292 292
   query90  1576182 188 182
   query91  135 133 110 110
   query92  61  56  55  55
   query93  2163848 534 534
   query94  737 402 294 294
   query95  321 260 265 260
   query96  489 618 279 279
   query97  2815289428112811
   query98  224 192 193 192
   query99  1302140213181318
   Total cold run time: 309730 ms
   Total hot run time: 191631 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614247633

   
   
   TPC-H: Total hot run time: 32971 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, 
data reload: false
   
   -- Round 1 --
   q1   17843   617254225422
   q2   2040300 178 178
   q3   10412   1224728 728
   q4   10882   965 536 536
   q5   8400241021412141
   q6   192 176 134 134
   q7   906 820 595 595
   q8   9228133911501150
   q9   5785515850325032
   q10  6988236119561956
   q11  483 290 268 268
   q12  344 370 227 227
   q13  18216   399833873387
   q14  272 251 243 243
   q15  528 482 477 477
   q16  649 626 589 589
   q17  569 872 337 337
   q18  8233654564616461
   q19  2878984 543 543
   q20  303 310 192 192
   q21  2714221820452045
   q22  362 332 330 330
   Total cold run time: 108227 ms
   Total hot run time: 32971 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5710551754695469
   q2   231 318 233 233
   q3   2257262223402340
   q4   1412180713951395
   q5   4328478148834781
   q6   165 162 129 129
   q7   2091192218271827
   q8   2687280526592659
   q9   7273726072627260
   q10  3020321227592759
   q11  586 520 498 498
   q12  675 790 601 601
   q13  3504397132933293
   q14  284 306 281 281
   q15  519 485 467 467
   q16  661 677 630 630
   q17  1209177712461246
   q18  7790745074097409
   q19  765 115410761076
   q20  2000205019191919
   q21  5653512450125012
   q22  631 607 571 571
   Total cold run time: 53451 ms
   Total hot run time: 51855 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614234362

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614230328

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614206532

   
   
   ClickBench: Total hot run time: 30.68 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, 
data reload: false
   
   query1   0.030.030.04
   query2   0.080.040.03
   query3   0.230.070.07
   query4   1.610.110.10
   query5   0.420.420.38
   query6   1.150.660.66
   query7   0.020.020.02
   query8   0.040.040.03
   query9   0.580.490.53
   query10  0.550.560.56
   query11  0.140.100.10
   query12  0.130.110.11
   query13  0.600.600.60
   query14  2.852.812.88
   query15  0.890.820.81
   query16  0.390.380.39
   query17  1.051.061.05
   query18  0.220.210.20
   query19  1.901.832.02
   query20  0.020.010.01
   query21  15.36   1.020.60
   query22  0.750.750.65
   query23  15.37   1.360.60
   query24  2.911.920.87
   query25  0.160.190.14
   query26  0.230.140.13
   query27  0.060.050.06
   query28  14.17   1.020.43
   query29  12.60   3.983.27
   query30  0.260.090.06
   query31  2.820.610.37
   query32  3.240.550.46
   query33  2.993.023.09
   query34  16.54   5.174.51
   query35  4.524.444.45
   query36  0.650.490.52
   query37  0.100.060.06
   query38  0.050.040.03
   query39  0.030.030.02
   query40  0.170.130.13
   query41  0.080.030.03
   query42  0.040.020.02
   query43  0.040.030.04
   Total cold run time: 106.04 s
   Total hot run time: 30.68 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614205536

   
   
   TPC-DS: Total hot run time: 184954 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, 
data reload: false
   
   query1   962 376 368 368
   query2   6516209720712071
   query3   6802211 218 211
   query4   33731   23198   22991   22991
   query5   4407575 438 438
   query6   270 185 173 173
   query7   4596496 311 311
   query8   282 244 214 214
   query9   9565268827042688
   query10  472 319 261 261
   query11  18192   15054   15022   15022
   query12  156 112 103 103
   query13  1649517 409 409
   query14  9175728668826882
   query15  251 188 182 182
   query16  8042641 482 482
   query17  1621744 564 564
   query18  2108401 306 306
   query19  229 188 155 155
   query20  115 109 110 109
   query21  212 123 100 100
   query22  4110443342674267
   query23  33827   33022   32880   32880
   query24  6450229122882288
   query25  529 488 377 377
   query26  1198265 156 156
   query27  1997463 333 333
   query28  5369245824482448
   query29  719 545 418 418
   query30  234 181 152 152
   query31  934 849 774 774
   query32  90  59  67  59
   query33  496 354 330 330
   query34  734 845 492 492
   query35  800 817 741 741
   query36  978 1063968 968
   query37  120 104 75  75
   query38  4136424640124012
   query39  1461138113981381
   query40  221 112 100 100
   query41  53  49  55  49
   query42  118 97  101 97
   query43  511 505 477 477
   query44  1332841 806 806
   query45  175 173 163 163
   query46  853 1032637 637
   query47  1802181017291729
   query48  380 404 305 305
   query49  783 495 390 390
   query50  621 641 390 390
   query51  4237422840894089
   query52  111 103 90  90
   query53  223 255 184 184
   query54  472 480 413 413
   query55  81  82  79  79
   query56  258 257 245 245
   query57  1158114110661066
   query58  243 237 245 237
   query59  3158299929792979
   query60  275 266 260 260
   query61  119 119 115 115
   query62  777 729 637 637
   query63  240 199 182 182
   query64  4436995 654 654
   query65  3214319631593159
   query66  1064407 313 313
   query67  15922   15579   15428   15428
   query68  4284817 546 546
   query69  466 290 261 261
   query70  1212110311181103
   query71  374 281 250 250
   query72  5796386237993799
   query73  648 747 360 360
   query74  10488   894189148914
   query75  3156315526822682
   query76  31391143755 755
   query77  492 339 271 271
   query78  992310078   94049404
   query79  2446797 608 608
   query80  788 528 465 465
   query81  538 316 244 244
   query82  348 151 125 125
   query83  170 173 154 154
   query84  237 88  77  77
   query85  754 360 304 304
   query86  440 321 306 306
   query87  4424448144984481
   query88  4173216522102165
   query89  398 321 302 302
   query90  1919190 188 188
   query91  137 139 108 108
   query92  70  60  55  55
   query93  2644879 535 535
   query94  746 408 294 294
   query95  338 262 262 262
   query96  484 603 277 277
   query97  2775288627502750
   query98  237 198 191 191
   query99  1286137212541254
   Total cold run time: 281702 ms
   Total hot run time: 184954 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to 

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


doris-robot commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614202865

   
   
   TPC-H: Total hot run time: 32100 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, 
data reload: false
   
   -- Round 1 --
   q1   17575   549653765376
   q2   2052334 182 182
   q3   10468   1303735 735
   q4   10229   969 517 517
   q5   7663238421672167
   q6   191 165 136 136
   q7   925 764 608 608
   q8   9235139411491149
   q9   5219492048334833
   q10  6811231418901890
   q11  476 280 258 258
   q12  341 358 214 214
   q13  17760   366130523052
   q14  228 244 206 206
   q15  512 471 459 459
   q16  646 619 598 598
   q17  558 860 317 317
   q18  7189647564176417
   q19  1807966 537 537
   q20  304 319 185 185
   q21  2804217319571957
   q22  356 330 307 307
   Total cold run time: 103349 ms
   Total hot run time: 32100 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5511546054375437
   q2   248 327 233 233
   q3   2242263823072307
   q4   1439183813651365
   q5   4323472546814681
   q6   167 156 124 124
   q7   2080198618101810
   q8   2656281126622662
   q9   7293715671737156
   q10  2932325827692769
   q11  572 516 494 494
   q12  717 749 595 595
   q13  3494393432933293
   q14  267 289 267 267
   q15  505 474 464 464
   q16  658 693 641 641
   q17  1207173112561256
   q18  7613737974007379
   q19  768 115710311031
   q20  2009202918661866
   q21  5644521849864986
   q22  597 652 555 555
   Total cold run time: 52942 ms
   Total hot run time: 51371 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614196175

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


lzyy2024 commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2614007660

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929542886


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,242 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin, length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (size_t i = 0; i < 4; i++) {
+unsigned char byte = (length >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four
+col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F];
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * compressed_str.size());
+
+unsigned char* src = compressed_str.data();
+for (size_t i = 0; i < compressed_str.size(); i++) {
+col_data[idx] = hex_itoc[(*src >> 4) & 0x0F];
+col_data[idx + 1] = hex_itoc[(*src & 0x0F)];
+idx += 2

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929542675


##
be/src/util/block_compression.cpp:
##
@@ -854,8 +854,13 @@ class ZlibBlockCompression : public BlockCompressionCodec {
 Slice s(*output);
 
 auto zres = ::compress((Bytef*)s.data, &s.size, (Bytef*)input.data, 
input.size);
-if (zres != Z_OK) {
-return Status::InvalidArgument("Fail to do ZLib compress, 
error={}", zError(zres));
+if (zres == Z_MEM_ERROR) {

Review Comment:
   split them to another PR may be better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929542732


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,242 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin, length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (size_t i = 0; i < 4; i++) {
+unsigned char byte = (length >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four
+col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F];
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * compressed_str.size());

Review Comment:
   like comment for `uncompress`, add a reserve operation before the for loop 
may be better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comm

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929542483


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,242 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin, length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (size_t i = 0; i < 4; i++) {
+unsigned char byte = (length >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four
+col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F];
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * compressed_str.size());
+
+unsigned char* src = compressed_str.data();
+for (size_t i = 0; i < compressed_str.size(); i++) {
+col_data[idx] = hex_itoc[(*src >> 4) & 0x0F];
+col_data[idx + 1] = hex_itoc[(*src & 0x0F)];
+idx += 2

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


lzyy2024 commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1928699139


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,269 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hexadecimal = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return make_nullable(std::make_shared());
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+auto null_column = ColumnUInt8::create(input_rows_count);
+auto& null_map = null_column->get_data();
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+null_map[row] = false;
+const auto& str = arg_column.get_data_at(row);
+data = Slice(str.data, str.size);
+
+auto st = compression_codec->compress(data, &compressed_str);

Review Comment:
   For example compress(abc) instead of compress('abc')



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929534888


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,242 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin, length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (size_t i = 0; i < 4; i++) {
+unsigned char byte = (length >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four
+col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F];
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * compressed_str.size());
+
+unsigned char* src = compressed_str.data();
+for (size_t i = 0; i < compressed_str.size(); i++) {
+col_data[idx] = hex_itoc[(*src >> 4) & 0x0F];
+col_data[idx + 1] = hex_itoc[(*src & 0x0F)];
+idx += 2

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929534784


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,242 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return std::make_shared();
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin, length);
+
+// Z_MEM_ERROR and Z_BUF_ERROR are already handled in compress, 
making sure st is always Z_OK
+auto st = compression_codec->compress(data, &compressed_str);
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (size_t i = 0; i < 4; i++) {
+unsigned char byte = (length >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four
+col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F];
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * compressed_str.size());
+
+unsigned char* src = compressed_str.data();
+for (size_t i = 0; i < compressed_str.size(); i++) {
+col_data[idx] = hex_itoc[(*src >> 4) & 0x0F];
+col_data[idx + 1] = hex_itoc[(*src & 0x0F)];
+idx += 2

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929533247


##
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Uncompress.java:
##
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.nereids.trees.expressions.functions.scalar;
+
+import org.apache.doris.catalog.FunctionSignature;
+import org.apache.doris.nereids.trees.expressions.Expression;
+import org.apache.doris.nereids.trees.expressions.functions.AlwaysNullable;
+import 
org.apache.doris.nereids.trees.expressions.functions.ExplicitlyCastableSignature;
+import org.apache.doris.nereids.trees.expressions.shape.UnaryExpression;
+import org.apache.doris.nereids.trees.expressions.visitor.ExpressionVisitor;
+import org.apache.doris.nereids.types.StringType;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+import java.util.List;
+
+/**
+ * ScalarFunction 'uncompress'.
+ */
+public class Uncompress extends ScalarFunction
+implements UnaryExpression, ExplicitlyCastableSignature, 
AlwaysNullable {
+
+public static final List SIGNATURES = ImmutableList.of(
+
FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE));

Review Comment:
   ditto.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929533192


##
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/Compress.java:
##
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.nereids.trees.expressions.functions.scalar;
+
+import org.apache.doris.catalog.FunctionSignature;
+import org.apache.doris.nereids.trees.expressions.Expression;
+import 
org.apache.doris.nereids.trees.expressions.functions.ExplicitlyCastableSignature;
+import org.apache.doris.nereids.trees.expressions.functions.PropagateNullable;
+import org.apache.doris.nereids.trees.expressions.shape.UnaryExpression;
+import org.apache.doris.nereids.trees.expressions.visitor.ExpressionVisitor;
+import org.apache.doris.nereids.types.StringType;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.ImmutableList;
+
+import java.util.List;
+
+/**
+ * ScalarFunction 'compress'.
+ */
+public class Compress extends ScalarFunction
+implements UnaryExpression, ExplicitlyCastableSignature, 
PropagateNullable {
+
+public static final List SIGNATURES = ImmutableList.of(
+
FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE));

Review Comment:
   should also accept `VarcharType`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-25 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1929533053


##
be/src/util/block_compression.cpp:
##
@@ -854,8 +854,13 @@ class ZlibBlockCompression : public BlockCompressionCodec {
 Slice s(*output);
 
 auto zres = ::compress((Bytef*)s.data, &s.size, (Bytef*)input.data, 
input.size);
-if (zres != Z_OK) {
-return Status::InvalidArgument("Fail to do ZLib compress, 
error={}", zError(zres));
+if (zres == Z_MEM_ERROR) {

Review Comment:
   also change other same calls



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-24 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1928875714


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,256 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hex_itoc = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return make_nullable(std::make_shared());
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& arg_data = arg_column.get_chars();
+auto& arg_offset = arg_column.get_offsets();
+const char* arg_begin = reinterpret_cast(arg_data.data());
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+auto null_column = ColumnUInt8::create(input_rows_count);
+auto& null_map = null_column->get_data();
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+null_map[row] = false;
+size_t length = arg_offset[row] - arg_offset[row - 1];
+data = Slice(arg_begin, length);
+
+auto st = compression_codec->compress(data, &compressed_str);
+
+if (!st.ok()) { // Failed to compress. The data should be a valid 
string or value.
+col_offset[row] = col_offset[row - 1];
+null_map[row] = true;
+continue;
+}
+
+size_t idx = col_data.size();
+if (!length) { // data is ''
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (size_t i = 0; i < 4; i++) {
+unsigned char byte = (length >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = hex_itoc[byte >> 4]; // higher four
+col_data[idx + 3 + i * 2] = hex_itoc[byte & 0x0F];
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * 

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-24 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1928336564


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,269 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+string hexadecimal = "0123456789ABCDEF";
+
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return make_nullable(std::make_shared());
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+auto null_column = ColumnUInt8::create(input_rows_count);
+auto& null_map = null_column->get_data();
+
+faststring compressed_str;
+Slice data;
+for (size_t row = 0; row < input_rows_count; row++) {
+null_map[row] = false;
+const auto& str = arg_column.get_data_at(row);
+data = Slice(str.data, str.size);
+
+auto st = compression_codec->compress(data, &compressed_str);

Review Comment:
   when will compress fail?



##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,269 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-22 Thread via GitHub


lzyy2024 commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1924960735


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,299 @@
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return make_nullable(std::make_shared());
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// LOG(INFO) << "Executing FunctionCompress with " << input_rows_count
+//   << " rows."; // Log the number of rows being processed
+
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+auto null_column = ColumnUInt8::create(input_rows_count);
+auto& null_map = null_column->get_data();
+
+faststring compressed_str;
+Slice data;
+for (int row = 0; row < input_rows_count; row++) {
+null_map[row] = false;
+const auto& str = arg_column.get_data_at(row);
+data = Slice(str.data, str.size);
+
+// Print the original string (before compression)
+// LOG(INFO) << "Original string at row " << row << ": "
+//   << std::string(str.data, str.size);
+
+auto st = compression_codec->compress(data, &compressed_str);
+
+if (!st.ok()) {
+// LOG(INFO) << "Compression failed at row " << row
+//   << ", skipping this row."; // Log failure
+col_offset[row] = col_offset[row - 1];
+null_map[row] = true;
+continue;
+}
+
+size_t idx = col_data.size();
+if (!str.size) { // null -> 0x
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+int value = (int)str.size;
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (int i = 0; i < 4; i++) {
+unsigned char byte = (value >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = "0123456789ABCDEF"[byte >> 4];   
// 高4位
+col_data[idx + 3 + i * 2] = "0123456789ABCDEF"[byte & 0x0F]; 
// 低4位
+}
+idx += 10;
+
+col_data.resize(col_data.size() + 2 * compressed_str.size());
+// memcpy(col_data.data() + col_data.size(), 
compressed_str.data(), compressed_str.size());
+
+unsigned char* src = compressed_str.data();
+{
+auto transform = [](char ch) -> unsigned char {
+char x;
+if (ch < 10) {
+x = ch + '0';
+} else {
+x = ch - 10 + 'A';
+}
+// LOG(INFO) << "transform" << (int)x << "->" << x;
+retu

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-21 Thread via GitHub


zclllyybb commented on code in PR #47307:
URL: https://github.com/apache/doris/pull/47307#discussion_r1924819842


##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,299 @@
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include "vec/data_types/data_type.h"
+#include "vec/data_types/data_type_nullable.h"
+#include "vec/data_types/data_type_number.h"
+#include "vec/data_types/data_type_string.h"
+#include "vec/functions/function.h"
+#include "vec/functions/simple_function_factory.h"
+
+namespace doris {
+class FunctionContext;
+} // namespace doris
+
+namespace doris::vectorized {
+
+class FunctionCompress : public IFunction {
+public:
+static constexpr auto name = "compress";
+static FunctionPtr create() { return std::make_shared(); 
}
+
+String get_name() const override { return name; }
+
+size_t get_number_of_arguments() const override { return 1; }
+
+DataTypePtr get_return_type_impl(const DataTypes& arguments) const 
override {
+return make_nullable(std::make_shared());
+}
+
+Status execute_impl(FunctionContext* context, Block& block, const 
ColumnNumbers& arguments,
+uint32_t result, size_t input_rows_count) const 
override {
+// LOG(INFO) << "Executing FunctionCompress with " << input_rows_count
+//   << " rows."; // Log the number of rows being processed
+
+// Get the compression algorithm object
+BlockCompressionCodec* compression_codec;
+
RETURN_IF_ERROR(get_block_compression_codec(segment_v2::CompressionTypePB::ZLIB,
+&compression_codec));
+
+const auto& arg_column =
+assert_cast(*block.get_by_position(arguments[0]).column);
+auto result_column = ColumnString::create();
+
+auto& col_data = result_column->get_chars();
+auto& col_offset = result_column->get_offsets();
+col_offset.resize(input_rows_count);
+
+auto null_column = ColumnUInt8::create(input_rows_count);
+auto& null_map = null_column->get_data();
+
+faststring compressed_str;
+Slice data;
+for (int row = 0; row < input_rows_count; row++) {
+null_map[row] = false;
+const auto& str = arg_column.get_data_at(row);
+data = Slice(str.data, str.size);
+
+// Print the original string (before compression)
+// LOG(INFO) << "Original string at row " << row << ": "
+//   << std::string(str.data, str.size);
+
+auto st = compression_codec->compress(data, &compressed_str);
+
+if (!st.ok()) {
+// LOG(INFO) << "Compression failed at row " << row
+//   << ", skipping this row."; // Log failure
+col_offset[row] = col_offset[row - 1];
+null_map[row] = true;
+continue;
+}
+
+size_t idx = col_data.size();
+if (!str.size) { // null -> 0x
+col_data.resize(col_data.size() + 2);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+col_offset[row] = col_offset[row - 1] + 2;
+continue;
+}
+
+// first ten digits represent the length of the uncompressed string
+int value = (int)str.size;
+col_data.resize(col_data.size() + 10);
+col_data[idx] = '0', col_data[idx + 1] = 'x';
+for (int i = 0; i < 4; i++) {
+unsigned char byte = (value >> (i * 8)) & 0xFF;
+col_data[idx + 2 + i * 2] = "0123456789ABCDEF"[byte >> 4];   
// 高4位

Review Comment:
   dont use Chinese



##
be/src/vec/functions/function_compress.cpp:
##
@@ -0,0 +1,299 @@
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "common/status.h"
+#include "util/block_compression.h"
+#include "util/faststring.h"
+#include "vec/aggregate_functions/aggregate_function.h"
+#include "vec/columns/column.h"
+#include "vec/columns/column_nullable.h"
+#include "vec/columns/column_string.h"
+#include "vec/columns/column_vector.h"
+#include "vec/columns/columns_number.h"
+#include "vec/common/assert_cast.h"
+#include "vec/core/block.h"
+#include "vec/core/column_numbers.h"
+#include "vec/core/column_with_type_and_name.h"
+#include "vec/core/types.h"
+#include 

Re: [PR] [Enhancement] Support some compress functions [doris]

2025-01-21 Thread via GitHub


hello-stephen commented on PR #47307:
URL: https://github.com/apache/doris/pull/47307#issuecomment-2606478336

   
   Thank you for your contribution to Apache Doris.
   Don't know what should be done next? See [How to process your 
PR](https://cwiki.apache.org/confluence/display/DORIS/How+to+process+your+PR).
   
   Please clearly describe your PR:
   1. What problem was fixed (it's best to include specific error reporting 
information). How it was fixed.
   2. Which behaviors were modified. What was the previous behavior, what is it 
now, why was it modified, and what possible impacts might there be.
   3. What features were added. Why was this function added?
   4. Which code was refactored and why was this part of the code refactored?
   5. Which functions were optimized and what is the difference before and 
after the optimization?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org