Fernando Pereira created PARQUET-845: ----------------------------------------
Summary: Efficient storage for several INT_8 and INT_16 Key: PARQUET-845 URL: https://issues.apache.org/jira/browse/PARQUET-845 Project: Parquet Issue Type: Wish Reporter: Fernando Pereira Priority: Minor In very large datasets, aggregating several INT8 into INT32 fields (or byte array) can make a big difference. In parquet, efficient algorithms exist for INT32, so if the LogicalType is INT_8 the encoded int might take up only one byte. However further optimizations could be made by allowing the user to better specify the types. What about BYTE_ARRAY logical type, backed by FIXED_LEN_BYTE_ARRAY type (or eventually INT_32)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)