Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-23 Thread via GitHub


dataroaring merged PR #51711:
URL: https://github.com/apache/doris/pull/51711


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2991404321

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-20 Thread via GitHub


github-actions[bot] commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2991404207

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-19 Thread via GitHub


hello-stephen commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2988559373

   # BE Regression && UT Coverage Report
   Increment line coverage `81.25% (39/48)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/51711_28601cef48a00fa941ea2fc851c3925838974d3b_merge/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/51711_28601cef48a00fa941ea2fc851c3925838974d3b_merge/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 61.23% (16096/26287) |
   | Line Coverage | 50.73% (151318/298291) |
   | Region Coverage   | 48.13% (86623/179994) |
   | Branch Coverage   | 41.53% (42475/102266) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-19 Thread via GitHub


hello-stephen commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2988576622

   # BE Regression && UT Coverage Report
   Increment line coverage `81.25% (39/48)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/51711_28601cef48a00fa941ea2fc851c3925838974d3b_merge/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/51711_28601cef48a00fa941ea2fc851c3925838974d3b_merge/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 61.23% (16096/26287) |
   | Line Coverage | 50.73% (151318/298291) |
   | Region Coverage   | 48.13% (86623/179994) |
   | Branch Coverage   | 41.53% (42475/102266) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-19 Thread via GitHub


hello-stephen commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2988528821

   # BE UT Coverage Report
   Increment line coverage `81.25% (39/48)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/28601cef48a00fa941ea2fc851c3925838974d3b_28601cef48a00fa941ea2fc851c3925838974d3b/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/28601cef48a00fa941ea2fc851c3925838974d3b_28601cef48a00fa941ea2fc851c3925838974d3b/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 56.37% (15057/26712) |
   | Line Coverage | 45.15% (134747/298452) |
   | Region Coverage   | 44.29% (67764/153007) |
   | Branch Coverage   | 38.89% (34787/89454) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-19 Thread via GitHub


doris-robot commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2988287382

   
   
   ClickBench: Total hot run time: 29.86 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit 28601cef48a00fa941ea2fc851c3925838974d3b, 
data reload: false
   
   query1   0.040.030.03
   query2   0.070.040.04
   query3   0.240.070.06
   query4   1.610.100.10
   query5   0.440.440.43
   query6   1.160.670.68
   query7   0.020.020.02
   query8   0.040.040.04
   query9   0.580.520.51
   query10  0.560.580.57
   query11  0.150.110.11
   query12  0.160.120.12
   query13  0.630.620.61
   query14  0.820.810.82
   query15  0.910.880.88
   query16  0.410.380.40
   query17  1.121.061.06
   query18  0.230.200.21
   query19  1.951.911.90
   query20  0.010.010.01
   query21  15.39   0.900.56
   query22  0.731.250.72
   query23  14.81   1.380.63
   query24  7.310.850.81
   query25  0.490.180.10
   query26  0.690.170.16
   query27  0.050.050.05
   query28  9.320.880.46
   query29  12.55   4.023.34
   query30  0.250.090.07
   query31  2.820.610.40
   query32  3.230.560.47
   query33  3.073.073.11
   query34  16.26   5.454.78
   query35  4.834.864.93
   query36  0.700.510.48
   query37  0.090.070.07
   query38  0.050.040.03
   query39  0.030.030.03
   query40  0.170.140.14
   query41  0.080.030.02
   query42  0.040.030.03
   query43  0.040.030.03
   Total cold run time: 104.15 s
   Total hot run time: 29.86 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-19 Thread via GitHub


doris-robot commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2988272427

   
   
   TPC-DS: Total hot run time: 186359 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit 28601cef48a00fa941ea2fc851c3925838974d3b, 
data reload: false
   
   query1   983 418 383 383
   query2   6558181918061806
   query3   6740217 221 217
   query4   26367   23998   23448   23448
   query5   4388632 464 464
   query6   311 213 202 202
   query7   4619501 295 295
   query8   274 231 212 212
   query9   8615263926442639
   query10  472 338 275 275
   query11  15150   15122   14969   14969
   query12  172 113 109 109
   query13  1673547 412 412
   query14  9141617461126112
   query15  203 192 174 174
   query16  7294622 478 478
   query17  1170720 568 568
   query18  1976412 303 303
   query19  190 204 158 158
   query20  123 115 115 115
   query21  216 128 105 105
   query22  3953408340334033
   query23  33866   33033   33146   33033
   query24  8494234323622343
   query25  541 451 393 393
   query26  1229265 149 149
   query27  2777515 347 347
   query28  4318211521212115
   query29  783 544 469 469
   query30  284 215 191 191
   query31  912 841 753 753
   query32  74  62  63  62
   query33  565 367 330 330
   query34  796 850 535 535
   query35  764 808 722 722
   query36  937 961 904 904
   query37  115 96  80  80
   query38  4158413440844084
   query39  1513146714161416
   query40  212 117 106 106
   query41  92  60  61  60
   query42  129 109 108 108
   query43  489 485 453 453
   query44  1312826 815 815
   query45  190 183 168 168
   query46  848 1016628 628
   query47  1762178517151715
   query48  393 422 328 328
   query49  749 491 400 400
   query50  632 666 408 408
   query51  4082410840814081
   query52  114 111 103 103
   query53  231 255 180 180
   query54  570 564 506 506
   query55  91  91  85  85
   query56  330 347 298 298
   query57  1184120811101110
   query58  262 261 254 254
   query59  2457265725222522
   query60  325 319 322 319
   query61  126 121 125 121
   query62  803 732 661 661
   query63  230 186 187 186
   query64  43121007666 666
   query65  4319417641704170
   query66  1151424 336 336
   query67  15740   15602   15721   15602
   query68  6975892 534 534
   query69  491 327 280 280
   query70  1189114811261126
   query71  416 402 293 293
   query72  5720470546574657
   query73  634 604 352 352
   query74  8853912688398839
   query75  3153317527212721
   query76  31981179745 745
   query77  485 373 289 289
   query78  10030   10180   93889388
   query79  2303809 588 588
   query80  570 533 437 437
   query81  498 266 233 233
   query82  182 128 101 101
   query83  243 252 232 232
   query84  248 103 87  87
   query85  752 360 310 310
   query86  374 317 280 280
   query87  4325453544814481
   query88  3866229223002292
   query89  384 313 282 282
   query90  1968214 211 211
   query91  144 138 109 109
   query92  71  61  66  61
   query93  1911952 570 570
   query94  674 424 310 310
   query95  367 286 289 286
   query96  493 576 283 283
   query97  2740273527062706
   query98  229 208 213 208
   query99  1300138012761276
   Total cold run time: 270635 ms
   Total hot run time: 186359 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go t

Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-19 Thread via GitHub


doris-robot commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2988234928

   
   
   TPC-H: Total hot run time: 33997 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit 28601cef48a00fa941ea2fc851c3925838974d3b, 
data reload: false
   
   -- Round 1 --
   q1   17579   521550135013
   q2   1927304 209 209
   q3   10270   1306712 712
   q4   10232   1034533 533
   q5   7690240323692369
   q6   179 162 133 133
   q7   891 745 601 601
   q8   9312134011051105
   q9   6852514251155115
   q10  6878240519561956
   q11  510 291 281 281
   q12  345 347 215 215
   q13  17798   368031083108
   q14  232 227 240 227
   q15  576 477 482 477
   q16  437 431 373 373
   q17  593 883 363 363
   q18  7748730171627162
   q19  1343950 541 541
   q20  338 343 225 225
   q21  3756256823072307
   q22  10351062972 972
   Total cold run time: 106521 ms
   Total hot run time: 33997 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5110505450725054
   q2   254 328 225 225
   q3   2200271323052305
   q4   1399182313371337
   q5   4236413343814133
   q6   215 173 131 131
   q7   2021200717061706
   q8   2581260125372537
   q9   7157713272357132
   q10  3062325928312831
   q11  580 512 497 497
   q12  677 751 622 622
   q13  3498391232463246
   q14  283 300 263 263
   q15  519 489 468 468
   q16  453 470 441 441
   q17  1170151313881388
   q18  7390711772007117
   q19  768 819 932 819
   q20  1929198218411841
   q21  4754443042804280
   q22  10601043989 989
   Total cold run time: 51316 ms
   Total hot run time: 49362 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-19 Thread via GitHub


freemandealer commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2988172590

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-16 Thread via GitHub


freemandealer commented on code in PR #51711:
URL: https://github.com/apache/doris/pull/51711#discussion_r2149222784


##
be/src/io/cache/block_file_cache_factory.cpp:
##
@@ -174,16 +171,52 @@ std::vector 
FileCacheFactory::get_base_paths() {
 return paths;
 }
 
+std::string validate_capacity(const std::string& path, int64_t new_capacity,
+  int64_t& valid_capacity) {
+struct statfs stat;
+if (statfs(path.c_str(), &stat) < 0) {
+auto ret = fmt::format("reset capacity {} statfs error {}. ", path, 
strerror(errno));
+LOG_ERROR(ret);
+return ret;
+}
+size_t disk_capacity = 
static_cast(static_cast(stat.f_blocks) *
+   
static_cast(stat.f_bsize));
+if (new_capacity == 0 || disk_capacity < new_capacity) {
+auto ret = fmt::format(
+"The cache {} config size {} is larger than disk size {} or 
zero, recalc "
+"it to disk size. ",
+path, new_capacity, disk_capacity);
+valid_capacity = disk_capacity;
+LOG_WARNING(ret);
+return ret;
+}
+valid_capacity = new_capacity;
+return "";
+}
+
 std::string FileCacheFactory::reset_capacity(const std::string& path, int64_t 
new_capacity) {
+std::stringstream ss;
+size_t total_capacity = 0;
 if (path.empty()) {
-std::stringstream ss;
-for (auto& [_, cache] : _path_to_cache) {
-ss << cache->reset_capacity(new_capacity);
+for (auto& [p, cache] : _path_to_cache) {
+int64_t valid_capacity;
+ss << validate_capacity(p, new_capacity, valid_capacity);

Review Comment:
   not exactly. the operation will never fail. target capacity larger than the 
max disk capacity will be regulated to max disk capacity.



##
be/src/io/cache/block_file_cache_factory.cpp:
##
@@ -174,16 +171,52 @@ std::vector 
FileCacheFactory::get_base_paths() {
 return paths;
 }
 
+std::string validate_capacity(const std::string& path, int64_t new_capacity,
+  int64_t& valid_capacity) {
+struct statfs stat;
+if (statfs(path.c_str(), &stat) < 0) {
+auto ret = fmt::format("reset capacity {} statfs error {}. ", path, 
strerror(errno));
+LOG_ERROR(ret);
+return ret;
+}
+size_t disk_capacity = 
static_cast(static_cast(stat.f_blocks) *
+   
static_cast(stat.f_bsize));
+if (new_capacity == 0 || disk_capacity < new_capacity) {
+auto ret = fmt::format(
+"The cache {} config size {} is larger than disk size {} or 
zero, recalc "
+"it to disk size. ",
+path, new_capacity, disk_capacity);
+valid_capacity = disk_capacity;
+LOG_WARNING(ret);
+return ret;
+}
+valid_capacity = new_capacity;
+return "";
+}
+
 std::string FileCacheFactory::reset_capacity(const std::string& path, int64_t 
new_capacity) {
+std::stringstream ss;
+size_t total_capacity = 0;
 if (path.empty()) {
-std::stringstream ss;
-for (auto& [_, cache] : _path_to_cache) {
-ss << cache->reset_capacity(new_capacity);
+for (auto& [p, cache] : _path_to_cache) {
+int64_t valid_capacity;
+ss << validate_capacity(p, new_capacity, valid_capacity);
+ss << cache->reset_capacity(valid_capacity);
+total_capacity += cache->capacity();
 }
+_capacity = total_capacity;
 return ss.str();
 } else {
 if (auto iter = _path_to_cache.find(path); iter != 
_path_to_cache.end()) {
-return iter->second->reset_capacity(new_capacity);
+int64_t valid_capacity;
+ss << validate_capacity(path, new_capacity, valid_capacity);

Review Comment:
   explained above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-16 Thread via GitHub


freemandealer commented on code in PR #51711:
URL: https://github.com/apache/doris/pull/51711#discussion_r2149216213


##
be/test/io/cache/block_file_cache_test.cpp:
##
@@ -7913,4 +7913,81 @@ TEST_F(BlockFileCacheTest, 
cached_remote_file_reader_normal_index) {
 FileCacheFactory::instance()->_capacity = 0;
 }
 
+TEST_F(BlockFileCacheTest, test_reset_capacity) {

Review Comment:
   validate_capacity is already tested in this case



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


hello-stephen commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2970630172

   # BE UT Coverage Report
   Increment line coverage `90.24% (37/41)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/c71e4cd92f0cde728375bd20cfe1e3d397bcce5a_c71e4cd92f0cde728375bd20cfe1e3d397bcce5a/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/c71e4cd92f0cde728375bd20cfe1e3d397bcce5a_c71e4cd92f0cde728375bd20cfe1e3d397bcce5a/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 56.41% (15076/26726) |
   | Line Coverage | 45.18% (134826/298425) |
   | Region Coverage   | 44.30% (67856/153186) |
   | Branch Coverage   | 38.87% (34833/89606) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


hello-stephen commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2970654724

   # BE Regression && UT Coverage Report
   Increment line coverage `90.24% (37/41)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/51711_c71e4cd92f0cde728375bd20cfe1e3d397bcce5a_merge/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/51711_c71e4cd92f0cde728375bd20cfe1e3d397bcce5a_merge/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 79.89% (21015/26304) |
   | Line Coverage | 72.77% (217050/298268) |
   | Region Coverage   | 71.01% (127928/180160) |
   | Branch Coverage   | 64.61% (66165/102412) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


doris-robot commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2970426013

   
   
   ClickBench: Total hot run time: 29.02 s
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
   ClickBench test result on commit c71e4cd92f0cde728375bd20cfe1e3d397bcce5a, 
data reload: false
   
   query1   0.040.040.03
   query2   0.080.040.04
   query3   0.240.070.07
   query4   1.600.110.10
   query5   0.460.430.42
   query6   1.180.680.68
   query7   0.020.020.02
   query8   0.050.040.04
   query9   0.600.550.54
   query10  0.580.590.58
   query11  0.160.120.12
   query12  0.160.130.12
   query13  0.620.610.62
   query14  0.820.830.82
   query15  0.920.890.88
   query16  0.400.380.40
   query17  1.121.061.05
   query18  0.240.210.23
   query19  2.001.831.88
   query20  0.020.020.02
   query21  15.40   0.950.57
   query22  0.781.260.67
   query23  14.82   1.430.64
   query24  7.330.780.28
   query25  0.280.150.18
   query26  0.600.160.16
   query27  0.060.050.05
   query28  9.181.020.48
   query29  12.62   4.283.51
   query30  0.250.100.08
   query31  2.820.630.42
   query32  3.240.590.48
   query33  3.263.123.14
   query34  15.73   5.164.51
   query35  4.574.544.51
   query36  0.670.510.49
   query37  0.100.070.06
   query38  0.060.050.04
   query39  0.040.030.03
   query40  0.170.140.13
   query41  0.080.020.03
   query42  0.040.030.03
   query43  0.050.030.03
   Total cold run time: 103.46 s
   Total hot run time: 29.02 s
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


doris-robot commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2970404119

   
   
   TPC-DS: Total hot run time: 197480 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
   TPC-DS sf100 test result on commit c71e4cd92f0cde728375bd20cfe1e3d397bcce5a, 
data reload: false
   
   query1   1410102810341028
   query2   6186205120862051
   query3   11009   446745984467
   query4   53352   25616   23221   23221
   query5   5080582 495 495
   query6   350 215 210 210
   query7   4900531 319 319
   query8   268 241 236 236
   query9   5466297430102974
   query10  487 378 296 296
   query11  15819   15091   14929   14929
   query12  169 121 117 117
   query13  1054590 456 456
   query14  10295   661966756619
   query15  222 214 191 191
   query16  7181669 533 533
   query17  1147778 649 649
   query18  1596440 359 359
   query19  235 230 194 194
   query20  148 138 127 127
   query21  225 132 121 121
   query22  4597455745424542
   query23  34893   33936   33867   33867
   query24  6991254125192519
   query25  486 497 439 439
   query26  710 285 165 165
   query27  2256536 386 386
   query28  3089243024202420
   query29  617 609 468 468
   query30  281 240 203 203
   query31  894 871 789 789
   query32  78  72  66  66
   query33  455 411 337 337
   query34  836 911 574 574
   query35  846 867 781 781
   query36  10161063971 971
   query37  127 108 83  83
   query38  4419451143644364
   query39  1543148014811480
   query40  224 137 117 117
   query41  74  66  61  61
   query42  150 127 127 127
   query43  556 586 518 518
   query44  1463917 937 917
   query45  187 183 171 171
   query46  928 1065691 691
   query47  1906191818251825
   query48  425 467 357 357
   query49  636 510 440 440
   query50  724 767 449 449
   query51  4316439042854285
   query52  127 124 117 117
   query53  247 270 201 201
   query54  646 638 577 577
   query55  102 94  97  94
   query56  328 328 323 323
   query57  1245124112421241
   query58  288 289 290 289
   query59  3030312228992899
   query60  375 353 342 342
   query61  138 131 135 131
   query62  762 726 676 676
   query63  236 208 199 199
   query64  18591107734 734
   query65  4234417241844172
   query66  750 407 312 312
   query67  16422   15915   15627   15627
   query68  7648980 599 599
   query69  567 321 295 295
   query70  1301123012211221
   query71  532 340 347 340
   query72  5782480749654807
   query73  1481690 398 398
   query74  9433921687548754
   query75  3884323327762776
   query76  43401212779 779
   query77  629 399 316 316
   query78  10110   10279   93769376
   query79  2277868 645 645
   query80  675 544 478 478
   query81  485 262 234 234
   query82  433 143 105 105
   query83  370 275 266 266
   query84  303 103 94  94
   query85  807 439 346 346
   query86  372 340 287 287
   query87  458544624462
   query88  3439249224642464
   query89  423 327 296 296
   query90  1961219 212 212
   query91  152 149 121 121
   query92  83  65  61  61
   query93  13381024655 655
   query94  688 411 322 322
   query95  387 340 306 306
   query96  556 587 312 312
   query97  2768286227222722
   query98  240 227 217 217
   query99  1431140412761276
   Total cold run time: 302450 ms
   Total hot run time: 197480 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go

Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


doris-robot commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2970370780

   
   
   TPC-H: Total hot run time: 34902 ms
   
   ```
   machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
   scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
   Tpch sf100 test result on commit c71e4cd92f0cde728375bd20cfe1e3d397bcce5a, 
data reload: false
   
   -- Round 1 --
   q1   17607   517349514951
   q2   1954317 188 188
   q3   10264   1266769 769
   q4   10297   1036550 550
   q5   8732235623472347
   q6   188 165 137 137
   q7   904 773 616 616
   q8   9334131211401140
   q9   6813512951005100
   q10  6865232318921892
   q11  488 299 296 296
   q12  355 355 227 227
   q13  17803   369731353135
   q14  234 230 222 222
   q15  566 489 492 489
   q16  448 460 391 391
   q17  656 855 398 398
   q18  7625717270827082
   q19  1805971 572 572
   q20  367 356 257 257
   q21  4239350431423142
   q22  1063103810011001
   Total cold run time: 108607 ms
   Total hot run time: 34902 ms
   
   - Round 2, with runtime_filter_mode=off -
   q1   5170506750175017
   q2   257 333 219 219
   q3   2212268424102410
   q4   1415187614191419
   q5   4233429843874298
   q6   223 172 128 128
   q7   2063197117891789
   q8   2611263425302530
   q9   7326729372257225
   q10  2926316427942794
   q11  596 507 500 500
   q12  729 794 632 632
   q13  3531397334073407
   q14  283 309 282 282
   q15  528 476 470 470
   q16  451 507 490 490
   q17  1181151414481448
   q18  7742762974877487
   q19  867 879 931 879
   q20  2093205719261926
   q21  5003440545724405
   q22  1121108010601060
   Total cold run time: 52561 ms
   Total hot run time: 50815 ms
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


gavinchou commented on code in PR #51711:
URL: https://github.com/apache/doris/pull/51711#discussion_r2145051056


##
be/test/io/cache/block_file_cache_test.cpp:
##
@@ -7913,4 +7913,81 @@ TEST_F(BlockFileCacheTest, 
cached_remote_file_reader_normal_index) {
 FileCacheFactory::instance()->_capacity = 0;
 }
 
+TEST_F(BlockFileCacheTest, test_reset_capacity) {

Review Comment:
   should we also test `validate_capacity()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


gavinchou commented on code in PR #51711:
URL: https://github.com/apache/doris/pull/51711#discussion_r2145048060


##
be/src/io/cache/block_file_cache_factory.cpp:
##
@@ -174,16 +171,52 @@ std::vector 
FileCacheFactory::get_base_paths() {
 return paths;
 }
 
+std::string validate_capacity(const std::string& path, int64_t new_capacity,
+  int64_t& valid_capacity) {
+struct statfs stat;
+if (statfs(path.c_str(), &stat) < 0) {
+auto ret = fmt::format("reset capacity {} statfs error {}. ", path, 
strerror(errno));
+LOG_ERROR(ret);
+return ret;
+}
+size_t disk_capacity = 
static_cast(static_cast(stat.f_blocks) *
+   
static_cast(stat.f_bsize));
+if (new_capacity == 0 || disk_capacity < new_capacity) {
+auto ret = fmt::format(
+"The cache {} config size {} is larger than disk size {} or 
zero, recalc "
+"it to disk size. ",
+path, new_capacity, disk_capacity);
+valid_capacity = disk_capacity;
+LOG_WARNING(ret);
+return ret;
+}
+valid_capacity = new_capacity;
+return "";
+}
+
 std::string FileCacheFactory::reset_capacity(const std::string& path, int64_t 
new_capacity) {
+std::stringstream ss;
+size_t total_capacity = 0;
 if (path.empty()) {
-std::stringstream ss;
-for (auto& [_, cache] : _path_to_cache) {
-ss << cache->reset_capacity(new_capacity);
+for (auto& [p, cache] : _path_to_cache) {
+int64_t valid_capacity;
+ss << validate_capacity(p, new_capacity, valid_capacity);

Review Comment:
   calling to `reset_capacity()` should fail



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


gavinchou commented on code in PR #51711:
URL: https://github.com/apache/doris/pull/51711#discussion_r2145047213


##
be/src/io/cache/block_file_cache_factory.cpp:
##
@@ -174,16 +171,52 @@ std::vector 
FileCacheFactory::get_base_paths() {
 return paths;
 }
 
+std::string validate_capacity(const std::string& path, int64_t new_capacity,
+  int64_t& valid_capacity) {
+struct statfs stat;
+if (statfs(path.c_str(), &stat) < 0) {
+auto ret = fmt::format("reset capacity {} statfs error {}. ", path, 
strerror(errno));
+LOG_ERROR(ret);
+return ret;
+}
+size_t disk_capacity = 
static_cast(static_cast(stat.f_blocks) *
+   
static_cast(stat.f_bsize));
+if (new_capacity == 0 || disk_capacity < new_capacity) {
+auto ret = fmt::format(
+"The cache {} config size {} is larger than disk size {} or 
zero, recalc "
+"it to disk size. ",
+path, new_capacity, disk_capacity);
+valid_capacity = disk_capacity;
+LOG_WARNING(ret);
+return ret;
+}
+valid_capacity = new_capacity;
+return "";
+}
+
 std::string FileCacheFactory::reset_capacity(const std::string& path, int64_t 
new_capacity) {
+std::stringstream ss;
+size_t total_capacity = 0;
 if (path.empty()) {
-std::stringstream ss;
-for (auto& [_, cache] : _path_to_cache) {
-ss << cache->reset_capacity(new_capacity);
+for (auto& [p, cache] : _path_to_cache) {
+int64_t valid_capacity;
+ss << validate_capacity(p, new_capacity, valid_capacity);

Review Comment:
   if `validate_capacity` return error like "statfs error", we should not 
proceed.



##
be/src/io/cache/block_file_cache_factory.cpp:
##
@@ -174,16 +171,52 @@ std::vector 
FileCacheFactory::get_base_paths() {
 return paths;
 }
 
+std::string validate_capacity(const std::string& path, int64_t new_capacity,
+  int64_t& valid_capacity) {
+struct statfs stat;
+if (statfs(path.c_str(), &stat) < 0) {
+auto ret = fmt::format("reset capacity {} statfs error {}. ", path, 
strerror(errno));
+LOG_ERROR(ret);
+return ret;
+}
+size_t disk_capacity = 
static_cast(static_cast(stat.f_blocks) *
+   
static_cast(stat.f_bsize));
+if (new_capacity == 0 || disk_capacity < new_capacity) {
+auto ret = fmt::format(
+"The cache {} config size {} is larger than disk size {} or 
zero, recalc "
+"it to disk size. ",
+path, new_capacity, disk_capacity);
+valid_capacity = disk_capacity;
+LOG_WARNING(ret);
+return ret;
+}
+valid_capacity = new_capacity;
+return "";
+}
+
 std::string FileCacheFactory::reset_capacity(const std::string& path, int64_t 
new_capacity) {
+std::stringstream ss;
+size_t total_capacity = 0;
 if (path.empty()) {
-std::stringstream ss;
-for (auto& [_, cache] : _path_to_cache) {
-ss << cache->reset_capacity(new_capacity);
+for (auto& [p, cache] : _path_to_cache) {
+int64_t valid_capacity;
+ss << validate_capacity(p, new_capacity, valid_capacity);
+ss << cache->reset_capacity(valid_capacity);
+total_capacity += cache->capacity();
 }
+_capacity = total_capacity;
 return ss.str();
 } else {
 if (auto iter = _path_to_cache.find(path); iter != 
_path_to_cache.end()) {
-return iter->second->reset_capacity(new_capacity);
+int64_t valid_capacity;
+ss << validate_capacity(path, new_capacity, valid_capacity);

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


gavinchou commented on code in PR #51711:
URL: https://github.com/apache/doris/pull/51711#discussion_r2145035871


##
be/src/io/cache/block_file_cache_factory.cpp:
##
@@ -174,16 +171,52 @@ std::vector 
FileCacheFactory::get_base_paths() {
 return paths;
 }
 
+std::string validate_capacity(const std::string& path, int64_t new_capacity,

Review Comment:
   add comment what it does



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


freemandealer commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2970267871

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [enhancement](filecache) fix default capacity and add reset_capacity validation [doris]

2025-06-13 Thread via GitHub


Thearas commented on PR #51711:
URL: https://github.com/apache/doris/pull/51711#issuecomment-2970267751

   
   Thank you for your contribution to Apache Doris.
   Don't know what should be done next? See [How to process your 
PR](https://cwiki.apache.org/confluence/display/DORIS/How+to+process+your+PR).
   
   Please clearly describe your PR:
   1. What problem was fixed (it's best to include specific error reporting 
information). How it was fixed.
   2. Which behaviors were modified. What was the previous behavior, what is it 
now, why was it modified, and what possible impacts might there be.
   3. What features were added. Why was this function added?
   4. Which code was refactored and why was this part of the code refactored?
   5. Which functions were optimized and what is the difference before and 
after the optimization?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]