[ https://issues.apache.org/jira/browse/IMPALA-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars Volker reassigned IMPALA-6964: ----------------------------------- Assignee: Sahil Takiar > Track stats about column and page sizes in Parquet reader > --------------------------------------------------------- > > Key: IMPALA-6964 > URL: https://issues.apache.org/jira/browse/IMPALA-6964 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Reporter: Tim Armstrong > Assignee: Sahil Takiar > Priority: Major > Labels: observability, parquet, ramp-up > > It would be good to have stats for scanned parquet data about page sizes. We > currently can't tell much about the "shape" of the parquet pages from the > profile. Some questions that are interesting: > * How big is each column? I.e. total compressed and decompressed size read. > * How big are pages on average? Either compressed or decompressed size > * What is the compression ratio for pages? Could be inferred from the above > two. > I think storing all the stats in the profile per-column would be too much > data, but we could probably infer most useful things from higher-level > aggregates. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org