Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-08 Thread via GitHub


zclllyybb merged PR #60526:
URL: https://github.com/apache/doris/pull/60526


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-08 Thread via GitHub


github-actions[bot] commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3869671852

   PR approved by anyone and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-08 Thread via GitHub


github-actions[bot] commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3869671737

   PR approved by at least one committer and no changes requested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-06 Thread via GitHub


hello-stephen commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3860002769

   # BE Regression && UT Coverage Report
   Increment line coverage `96.61% (57/59)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 71.64% (25902/36157) |
   | Line Coverage | 54.28% (270766/498804) |
   | Region Coverage   | 51.67% (225106/435687) |
   | Branch Coverage   | 53.15% (96643/181817) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-06 Thread via GitHub


hello-stephen commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3859306368

   # BE Regression && UT Coverage Report
   Increment line coverage `96.61% (57/59)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 71.63% (25900/36157) |
   | Line Coverage | 54.27% (270722/498804) |
   | Region Coverage   | 51.63% (224950/435687) |
   | Branch Coverage   | 53.14% (96610/181817) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-06 Thread via GitHub


hello-stephen commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3858809881

   # BE UT Coverage Report
   Increment line coverage `94.92% (56/59)` :tada:
   
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/430d5297faaa6070e97018f2e5caa758e54ec7f1_430d5297faaa6070e97018f2e5caa758e54ec7f1/increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/430d5297faaa6070e97018f2e5caa758e54ec7f1_430d5297faaa6070e97018f2e5caa758e54ec7f1/report/index.html)
   | Category  | Coverage   |
   |---||
   | Function Coverage | 52.60% (19409/36896) |
   | Line Coverage | 36.12% (180588/500028) |
   | Region Coverage   | 32.44% (139904/431303) |
   | Branch Coverage   | 33.46% (60607/181109) |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-05 Thread via GitHub


hello-stephen commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3858269582

   # FE UT Coverage Report
   Increment line coverage `` :tada:
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1/fe_increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1/fe_report/index.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-05 Thread via GitHub


BiteThet commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3858051637

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-05 Thread via GitHub


hello-stephen commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3852502105

   # FE UT Coverage Report
   Increment line coverage `` :tada:
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_8224ee7a1e1263ecf59bff98c091d69b9c3e1b45/fe_increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_8224ee7a1e1263ecf59bff98c091d69b9c3e1b45/fe_report/index.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-05 Thread via GitHub


BiteThet commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3852075951

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-04 Thread via GitHub


hello-stephen commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3851389500

   # FE UT Coverage Report
   Increment line coverage `` :tada:
   [Increment coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_21cc719bbfbd17ffe3481317bd00f00173e5b1e3/fe_increment_report/index.html)
   [Complete coverage 
report](http://coverage.selectdb-in.cc/coverage/60526_21cc719bbfbd17ffe3481317bd00f00173e5b1e3/fe_report/index.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-04 Thread via GitHub


Copilot commented on code in PR #60526:
URL: https://github.com/apache/doris/pull/60526#discussion_r2767237937


##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -53,26 +85,49 @@ char* DataTypeFixedLengthObject::serialize(const IColumn& 
column, char* buf,
 DCHECK(src_col.item_size() > 0)
 << "[serialize]item size of DataTypeFixedLengthObject should be 
greater than 0";
 
-// item size
 unaligned_store(buf, src_col.item_size());
 buf += sizeof(size_t);
-// column data
+
 const auto* origin_data = src_col.get_data().data();
 memcpy(buf, origin_data, real_need_copy_num * src_col.item_size());
 buf += real_need_copy_num * src_col.item_size();
-
 return buf;
 }
 
 const char* DataTypeFixedLengthObject::deserialize(const char* buf, 
MutableColumnPtr* column,
int be_exec_version) const {
-//const flag
+if (be_exec_version >= USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION) {
+// New deserialization with streamvbyte decoding for large data
+size_t real_have_saved_num = 0;
+buf = deserialize_const_flag_and_row_num(buf, column, 
&real_have_saved_num);
+
+auto& dst_col = assert_cast(*(column->get()));

Review Comment:
   The original column pointer must be saved before calling 
deserialize_const_flag_and_row_num, similar to how it's done in DataTypeDecimal 
at line 178. The deserialize_const_flag_and_row_num function may wrap the 
column in a ColumnConst, and subsequent operations need to work on the 
underlying data column, not the wrapper. The correct pattern is:
   1. Save the original column pointer: auto* origin_column = column->get();
   2. Call deserialize_const_flag_and_row_num
   3. Use origin_column for subsequent operations
   ```suggestion
   auto* origin_column = column->get();
   size_t real_have_saved_num = 0;
   buf = deserialize_const_flag_and_row_num(buf, column, 
&real_have_saved_num);
   
   auto& dst_col = assert_cast(*origin_column);
   ```



##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -97,6 +153,23 @@ const char* DataTypeFixedLengthObject::deserialize(const 
char* buf, MutableColum
 // data  : item data1 | item data2...
 int64_t DataTypeFixedLengthObject::get_uncompressed_serialized_bytes(const 
IColumn& column,
  int 
be_exec_version) const {
+if (be_exec_version >= USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION) {
+// New format size calculation with streamvbyte
+auto size = sizeof(bool) + sizeof(size_t) + sizeof(size_t) + 
sizeof(size_t);
+auto real_need_copy_num = is_column_const(column) ? 1 : column.size();
+const auto& src_col = assert_cast(column);

Review Comment:
   When the column is a ColumnConst, directly casting it to ColumnType will 
fail. The code should first unwrap the const column to get the underlying data 
column before casting, similar to how the old format handles it at lines 
175-178. The serialize function correctly handles this at line 41 using 
serialize_const_flag_and_row_num which unwraps const columns, but 
get_uncompressed_serialized_bytes does not perform this unwrapping.
   ```suggestion
   const IColumn* data_column = &column;
   auto real_need_copy_num = column.size();
   if (is_column_const(column)) {
   const auto& const_column = assert_cast(column);
   data_column = &(const_column.get_data_column());
   real_need_copy_num = 1;
   }
   const auto& src_col = assert_cast(*data_column);
   ```



##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -97,6 +153,23 @@ const char* DataTypeFixedLengthObject::deserialize(const 
char* buf, MutableColum
 // data  : item data1 | item data2...
 int64_t DataTypeFixedLengthObject::get_uncompressed_serialized_bytes(const 
IColumn& column,
  int 
be_exec_version) const {
+if (be_exec_version >= USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION) {
+// New format size calculation with streamvbyte
+auto size = sizeof(bool) + sizeof(size_t) + sizeof(size_t) + 
sizeof(size_t);
+auto real_need_copy_num = is_column_const(column) ? 1 : column.size();
+const auto& src_col = assert_cast(column);
+auto mem_size = src_col.item_size() * real_need_copy_num;
+if (mem_size <= SERIALIZED_MEM_SIZE_LIMIT) {
+return size + mem_size;
+} else {
+// Throw exception if mem_size is large than UINT32_MAX

Review Comment:
   The comment has a grammatical error: "large than" should be "larger than".



##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -33,12 +34,43 @@ namespace doris::vectorized {
 
 char* DataTypeFixedLengthObject::serializ

Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-04 Thread via GitHub


BiteThet commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3851196710

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-04 Thread via GitHub


BiteThet commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3851193945

   run buildall


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]

2026-02-04 Thread via GitHub


Thearas commented on PR #60526:
URL: https://github.com/apache/doris/pull/60526#issuecomment-3851189025

   
   Thank you for your contribution to Apache Doris.
   Don't know what should be done next? See [How to process your 
PR](https://cwiki.apache.org/confluence/display/DORIS/How+to+process+your+PR).
   
   Please clearly describe your PR:
   1. What problem was fixed (it's best to include specific error reporting 
information). How it was fixed.
   2. Which behaviors were modified. What was the previous behavior, what is it 
now, why was it modified, and what possible impacts might there be.
   3. What features were added. Why was this function added?
   4. Which code was refactored and why was this part of the code refactored?
   5. Which functions were optimized and what is the difference before and 
after the optimization?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]