I have filed a JIRA: https://issues.apache.org/jira/browse/PARQUET-2253
Best,
Gang
On Thu, Mar 2, 2023 at 5:39 PM Patrick Hansert
wrote:
> > This is by design. I guess it benefits sequential scan where the
> dictionary
> > page is read first and then followed by its encoded indices in the data
This is by design. I guess it benefits sequential scan where the dictionary
page is read first and then followed by its encoded indices in the data
pages. Otherwise we need to seek anyway.
Good, then it shouldn't cause problems when putting the dictionary after
all-null pages
I think that
> What are the reasons for forcing the dictionary to be the first page?
This is by design. I guess it benefits sequential scan where the dictionary
page is read first and then followed by its encoded indices in the data
pages. Otherwise we need to seek anyway.
> can this be changed to allow for
Hi Gang,
thanks for your reply.
On 01.03.23 03:09, Gang Wu wrote:
If at least one record in the beginning 2 rows is not null, then the
encoded size will be much better.
That is the workaround I have been using for the past weeks, although my
tests show that at least two values are
Hi Patrick,
Thanks for reporting the issue!
Let me try to answer your question in short.
1. In your case, the good data is dictionary-encoded [1] and the size of
the dictionary is 1.
2. The RLE encoding [2] you have observed from the good data applies to the
only indices after dictionary
Hello everyone!
First of all, I hope I'm in the right place. The contribution guidelines
directed me here after I discovered the registration in the Jira tracker
is closed. I'm a Ph.D. student at RPTU Kaiserslautern-Landau, and my
current research revolves around sorting-based improvements to