[ https://issues.apache.org/jira/browse/IMPALA-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-9201. ----------------------------------- Resolution: Not A Bug This isn't how compression in Parquet works - the pages within the file are compressed, not the whole file. > Impala can't read parquet file compressed by zstd bash command > -------------------------------------------------------------- > > Key: IMPALA-9201 > URL: https://issues.apache.org/jira/browse/IMPALA-9201 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 3.4.0 > Reporter: Xiaomeng Zhang > Assignee: Abhishek Rawat > Priority: Major > > To reproduce: > # get a parquet file written by impala > # use "hadoop fs -get" to download locally > # use command "zstd -i parquetfile -o zstdfile" to get a zstd compressed > file parquet.zst. > # use "hadoop fs -put" to put zstd file in directory > "/test-warehouse/par_zstd" > # in impala, create table with location on -"/test-warehouse/par_zstd" > # run select * from that table, get error : > {code:java} > [localhost:21000] default> select * from par_zstd; > Query: select * from par_zstd > Query submitted at: 2019-11-25 14:59:07 (Coordinator: > http://xiaomeng-OptiPlex-9020:25000) > Query progress can be monitored at: > http://xiaomeng-OptiPlex-9020:25000/query_plan?query_id=b0411d5136965e30:549208ad00000000 > ERROR: File 'hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zst' has > an invalid Parquet version number: ���� > . Please check that it is a valid Parquet file. This error can also occur due > to stale metadata. If you believe this is a valid Parquet file, try running > "refresh default.par_zstd". > {code} > In hive run select * from table, get error: > {code:java} > Error: java.io.IOException: java.lang.RuntimeException: > hdfs://localhost:20500/test-warehouse/par_zstd/parquet.zstd is not a Parquet > file. expected magic number at tail [80, 65, 82, 49] but found [-2, -72, > -113, -90] (state=,code=0) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org