Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
zclllyybb merged PR #60526: URL: https://github.com/apache/doris/pull/60526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
github-actions[bot] commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3869671852 PR approved by anyone and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
github-actions[bot] commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3869671737 PR approved by at least one committer and no changes requested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
hello-stephen commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3860002769 # BE Regression && UT Coverage Report Increment line coverage `96.61% (57/59)` :tada: [Increment coverage report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/increment_report/index.html) [Complete coverage report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/report/index.html) | Category | Coverage | |---|| | Function Coverage | 71.64% (25902/36157) | | Line Coverage | 54.28% (270766/498804) | | Region Coverage | 51.67% (225106/435687) | | Branch Coverage | 53.15% (96643/181817) | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
hello-stephen commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3859306368 # BE Regression && UT Coverage Report Increment line coverage `96.61% (57/59)` :tada: [Increment coverage report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/increment_report/index.html) [Complete coverage report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1_merge/report/index.html) | Category | Coverage | |---|| | Function Coverage | 71.63% (25900/36157) | | Line Coverage | 54.27% (270722/498804) | | Region Coverage | 51.63% (224950/435687) | | Branch Coverage | 53.14% (96610/181817) | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
hello-stephen commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3858809881 # BE UT Coverage Report Increment line coverage `94.92% (56/59)` :tada: [Increment coverage report](http://coverage.selectdb-in.cc/coverage/430d5297faaa6070e97018f2e5caa758e54ec7f1_430d5297faaa6070e97018f2e5caa758e54ec7f1/increment_report/index.html) [Complete coverage report](http://coverage.selectdb-in.cc/coverage/430d5297faaa6070e97018f2e5caa758e54ec7f1_430d5297faaa6070e97018f2e5caa758e54ec7f1/report/index.html) | Category | Coverage | |---|| | Function Coverage | 52.60% (19409/36896) | | Line Coverage | 36.12% (180588/500028) | | Region Coverage | 32.44% (139904/431303) | | Branch Coverage | 33.46% (60607/181109) | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
hello-stephen commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3858269582 # FE UT Coverage Report Increment line coverage `` :tada: [Increment coverage report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1/fe_increment_report/index.html) [Complete coverage report](http://coverage.selectdb-in.cc/coverage/60526_430d5297faaa6070e97018f2e5caa758e54ec7f1/fe_report/index.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
BiteThet commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3858051637 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
hello-stephen commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3852502105 # FE UT Coverage Report Increment line coverage `` :tada: [Increment coverage report](http://coverage.selectdb-in.cc/coverage/60526_8224ee7a1e1263ecf59bff98c091d69b9c3e1b45/fe_increment_report/index.html) [Complete coverage report](http://coverage.selectdb-in.cc/coverage/60526_8224ee7a1e1263ecf59bff98c091d69b9c3e1b45/fe_report/index.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
BiteThet commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3852075951 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
hello-stephen commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3851389500 # FE UT Coverage Report Increment line coverage `` :tada: [Increment coverage report](http://coverage.selectdb-in.cc/coverage/60526_21cc719bbfbd17ffe3481317bd00f00173e5b1e3/fe_increment_report/index.html) [Complete coverage report](http://coverage.selectdb-in.cc/coverage/60526_21cc719bbfbd17ffe3481317bd00f00173e5b1e3/fe_report/index.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
Copilot commented on code in PR #60526:
URL: https://github.com/apache/doris/pull/60526#discussion_r2767237937
##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -53,26 +85,49 @@ char* DataTypeFixedLengthObject::serialize(const IColumn&
column, char* buf,
DCHECK(src_col.item_size() > 0)
<< "[serialize]item size of DataTypeFixedLengthObject should be
greater than 0";
-// item size
unaligned_store(buf, src_col.item_size());
buf += sizeof(size_t);
-// column data
+
const auto* origin_data = src_col.get_data().data();
memcpy(buf, origin_data, real_need_copy_num * src_col.item_size());
buf += real_need_copy_num * src_col.item_size();
-
return buf;
}
const char* DataTypeFixedLengthObject::deserialize(const char* buf,
MutableColumnPtr* column,
int be_exec_version) const {
-//const flag
+if (be_exec_version >= USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION) {
+// New deserialization with streamvbyte decoding for large data
+size_t real_have_saved_num = 0;
+buf = deserialize_const_flag_and_row_num(buf, column,
&real_have_saved_num);
+
+auto& dst_col = assert_cast(*(column->get()));
Review Comment:
The original column pointer must be saved before calling
deserialize_const_flag_and_row_num, similar to how it's done in DataTypeDecimal
at line 178. The deserialize_const_flag_and_row_num function may wrap the
column in a ColumnConst, and subsequent operations need to work on the
underlying data column, not the wrapper. The correct pattern is:
1. Save the original column pointer: auto* origin_column = column->get();
2. Call deserialize_const_flag_and_row_num
3. Use origin_column for subsequent operations
```suggestion
auto* origin_column = column->get();
size_t real_have_saved_num = 0;
buf = deserialize_const_flag_and_row_num(buf, column,
&real_have_saved_num);
auto& dst_col = assert_cast(*origin_column);
```
##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -97,6 +153,23 @@ const char* DataTypeFixedLengthObject::deserialize(const
char* buf, MutableColum
// data : item data1 | item data2...
int64_t DataTypeFixedLengthObject::get_uncompressed_serialized_bytes(const
IColumn& column,
int
be_exec_version) const {
+if (be_exec_version >= USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION) {
+// New format size calculation with streamvbyte
+auto size = sizeof(bool) + sizeof(size_t) + sizeof(size_t) +
sizeof(size_t);
+auto real_need_copy_num = is_column_const(column) ? 1 : column.size();
+const auto& src_col = assert_cast(column);
Review Comment:
When the column is a ColumnConst, directly casting it to ColumnType will
fail. The code should first unwrap the const column to get the underlying data
column before casting, similar to how the old format handles it at lines
175-178. The serialize function correctly handles this at line 41 using
serialize_const_flag_and_row_num which unwraps const columns, but
get_uncompressed_serialized_bytes does not perform this unwrapping.
```suggestion
const IColumn* data_column = &column;
auto real_need_copy_num = column.size();
if (is_column_const(column)) {
const auto& const_column = assert_cast(column);
data_column = &(const_column.get_data_column());
real_need_copy_num = 1;
}
const auto& src_col = assert_cast(*data_column);
```
##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -97,6 +153,23 @@ const char* DataTypeFixedLengthObject::deserialize(const
char* buf, MutableColum
// data : item data1 | item data2...
int64_t DataTypeFixedLengthObject::get_uncompressed_serialized_bytes(const
IColumn& column,
int
be_exec_version) const {
+if (be_exec_version >= USE_NEW_FIXED_OBJECT_SERIALIZATION_VERSION) {
+// New format size calculation with streamvbyte
+auto size = sizeof(bool) + sizeof(size_t) + sizeof(size_t) +
sizeof(size_t);
+auto real_need_copy_num = is_column_const(column) ? 1 : column.size();
+const auto& src_col = assert_cast(column);
+auto mem_size = src_col.item_size() * real_need_copy_num;
+if (mem_size <= SERIALIZED_MEM_SIZE_LIMIT) {
+return size + mem_size;
+} else {
+// Throw exception if mem_size is large than UINT32_MAX
Review Comment:
The comment has a grammatical error: "large than" should be "larger than".
##
be/src/vec/data_types/data_type_fixed_length_object.cpp:
##
@@ -33,12 +34,43 @@ namespace doris::vectorized {
char* DataTypeFixedLengthObject::serializ
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
BiteThet commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3851196710 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
BiteThet commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3851193945 run buildall -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] [Improvement](serialize) use streamvbyte_encode in DataTypeFixedLengthObject::serialize [doris]
Thearas commented on PR #60526: URL: https://github.com/apache/doris/pull/60526#issuecomment-3851189025 Thank you for your contribution to Apache Doris. Don't know what should be done next? See [How to process your PR](https://cwiki.apache.org/confluence/display/DORIS/How+to+process+your+PR). Please clearly describe your PR: 1. What problem was fixed (it's best to include specific error reporting information). How it was fixed. 2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be. 3. What features were added. Why was this function added? 4. Which code was refactored and why was this part of the code refactored? 5. Which functions were optimized and what is the difference before and after the optimization? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
