[GitHub] [orc] stiga-huang commented on a change in pull request #1056: ORC-1122: [C++] Add buffer to decode the whole run in RleDecoderV2

2022-03-04 Thread GitBox
stiga-huang commented on a change in pull request #1056: URL: https://github.com/apache/orc/pull/1056#discussion_r820005368 ## File path: c++/src/RLEv2.hh ## @@ -25,6 +25,7 @@ #include +#define MAX_LITERAL_SIZE 512 Review comment: Yeah, so the encoder and the

[GitHub] [orc] dongjoon-hyun commented on a change in pull request #1056: ORC-1122: [C++] Add buffer to decode the whole run in RleDecoderV2

2022-03-04 Thread GitBox
dongjoon-hyun commented on a change in pull request #1056: URL: https://github.com/apache/orc/pull/1056#discussion_r819926208 ## File path: c++/src/RLEv2.hh ## @@ -25,6 +25,7 @@ #include +#define MAX_LITERAL_SIZE 512 Review comment: This is moved here from

[GitHub] [orc] pgaref commented on a change in pull request #1055: ORC-1121: Fix column coversion check bug which causes column filters don't work

2022-03-04 Thread GitBox
pgaref commented on a change in pull request #1055: URL: https://github.com/apache/orc/pull/1055#discussion_r819900799 ## File path: java/core/src/java/org/apache/orc/impl/SchemaEvolution.java ## @@ -126,6 +128,11 @@ public SchemaEvolution(TypeDescription fileSchema, }

[GitHub] [orc] dongjoon-hyun commented on pull request #1055: ORC-1121: Fix column coversion check bug which causes column filters don't work

2022-03-04 Thread GitBox
dongjoon-hyun commented on pull request #1055: URL: https://github.com/apache/orc/pull/1055#issuecomment-1059475415 cc @stiga-huang , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [orc] dongjoon-hyun commented on pull request #1054: ORC-1120: Remove C++ library limitation about write version

2022-03-04 Thread GitBox
dongjoon-hyun commented on pull request #1054: URL: https://github.com/apache/orc/pull/1054#issuecomment-1059470438 This is backported to branch-1.7. cc @williamhyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [orc] dongjoon-hyun commented on pull request #1054: ORC-1120: Remove C++ library limitation about write version

2022-03-04 Thread GitBox
dongjoon-hyun commented on pull request #1054: URL: https://github.com/apache/orc/pull/1054#issuecomment-1059469196 Welcome to the Apache ORC community, @XinyuZeng . I added you to the Apache ORC contributor group and assign ORC-1120 to you. -- This is an automated message from the

[GitHub] [orc] dongjoon-hyun closed issue #1052: Readme still says cpp only writes version 0.11?

2022-03-04 Thread GitBox
dongjoon-hyun closed issue #1052: URL: https://github.com/apache/orc/issues/1052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [orc] dongjoon-hyun merged pull request #1054: ORC-1120: Remove C++ library limitation about write version

2022-03-04 Thread GitBox
dongjoon-hyun merged pull request #1054: URL: https://github.com/apache/orc/pull/1054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [orc] dongjoon-hyun commented on pull request #1057: ORC-1123: Add `estimationMemory` method for writer

2022-03-04 Thread GitBox
dongjoon-hyun commented on pull request #1057: URL: https://github.com/apache/orc/pull/1057#issuecomment-1059427822 +1 for doing that separately in a new ORC JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [orc] pgaref commented on pull request #1057: ORC-1123: Add `estimationMemory` method for writer

2022-03-04 Thread GitBox
pgaref commented on pull request #1057: URL: https://github.com/apache/orc/pull/1057#issuecomment-1059426831 Seems like we might be overestimating memory usage when taking into account unused buffers -- would probably make sense documenting/testing these estimates. -- This is an

Re: [DISCUSS] The correct approach to estimate the byte size for an unclosed ORC writer.

2022-03-04 Thread liwei li
Thanks to openinx for opening this discussion. One thing to note, the current approach faces a problem, because of some optimization mechanisms, when writing a large amount of duplicate data, there will be some deviation between the estimated and the actual size. However, when cached data is

[GitHub] [orc] dongjoon-hyun commented on pull request #1057: ORC-1123: Add `estimationMemory` method for writer

2022-03-04 Thread GitBox
dongjoon-hyun commented on pull request #1057: URL: https://github.com/apache/orc/pull/1057#issuecomment-1059364779 Also, cc @williamhyun since he is the release manager for Apache ORC 1.7.4. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [orc] dongjoon-hyun merged pull request #1057: ORC-1123: Add `estimationMemory` method for writer

2022-03-04 Thread GitBox
dongjoon-hyun merged pull request #1057: URL: https://github.com/apache/orc/pull/1057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [orc] dongjoon-hyun commented on pull request #1054: ORC-1120: fix readme about cpp orc version

2022-03-04 Thread GitBox
dongjoon-hyun commented on pull request #1054: URL: https://github.com/apache/orc/pull/1054#issuecomment-1059351988 cc @wgtmac and @stiga-huang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [orc] guiyanakuang opened a new pull request #1057: ORC-1123: Add `estimationMemory` method for writer

2022-03-04 Thread GitBox
guiyanakuang opened a new pull request #1057: URL: https://github.com/apache/orc/pull/1057 ### What changes were proposed in this pull request? Add `estimationMemory` method for writer. It exposes the internal treeWriter to estimate the memory used for the buffer. ### Why are

[jira] [Created] (ORC-1123) Add `estimationMemory` method for writer

2022-03-04 Thread Yiqun Zhang (Jira)
Yiqun Zhang created ORC-1123: Summary: Add `estimationMemory` method for writer Key: ORC-1123 URL: https://issues.apache.org/jira/browse/ORC-1123 Project: ORC Issue Type: Improvement

[GitHub] [orc] stiga-huang opened a new pull request #1056: ORC-1122: [C++] Add buffer to decode the whole run in RleDecoderV2

2022-03-04 Thread GitBox
stiga-huang opened a new pull request #1056: URL: https://github.com/apache/orc/pull/1056 ### What changes were proposed in this pull request? This PR adds a buffer to decode the whole run at once in RleDecoderV2, which leverages the improvement of ORC-1020 to deal with null

[jira] [Created] (ORC-1122) Add buffer to decode the whole run in RleDecoderV2

2022-03-04 Thread Quanlong Huang (Jira)
Quanlong Huang created ORC-1122: --- Summary: Add buffer to decode the whole run in RleDecoderV2 Key: ORC-1122 URL: https://issues.apache.org/jira/browse/ORC-1122 Project: ORC Issue Type:

[GitHub] [orc] guiyanakuang commented on pull request #1055: ORC-1121: Fix column coversion check bug which causes column filters don't work

2022-03-04 Thread GitBox
guiyanakuang commented on pull request #1055: URL: https://github.com/apache/orc/pull/1055#issuecomment-1059038963 cc @pgaref @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [orc] guiyanakuang commented on pull request #1055: ORC-1121: Fix column coversion check bug which causes column filters don't work

2022-03-04 Thread GitBox
guiyanakuang commented on pull request #1055: URL: https://github.com/apache/orc/pull/1055#issuecomment-1059037655 LGTM (Pending CIs) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [orc] PengleiShi commented on pull request #1055: ORC-1121: Fix column coversion check bug which causes column filters don't work

2022-03-04 Thread GitBox
PengleiShi commented on pull request #1055: URL: https://github.com/apache/orc/pull/1055#issuecomment-1059028235 ping @guiyanakuang could you help to review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [orc] PengleiShi opened a new pull request #1055: Fix column coversion check bug which causes column filters don't work

2022-03-04 Thread GitBox
PengleiShi opened a new pull request #1055: URL: https://github.com/apache/orc/pull/1055 ### What changes were proposed in this pull request? Add a map in `SchemaEvolution` which contains the mapping from the file column id to the reader column id, the mapping will be used in