Lili Ma created HAWQ-1366: ----------------------------- Summary: HAWQ should throw error if finding dictionary encoding type for Parquet Key: HAWQ-1366 URL: https://issues.apache.org/jira/browse/HAWQ-1366 Project: Apache HAWQ Issue Type: Bug Components: Storage Reporter: Lili Ma Assignee: Ed Espino Fix For: 2.2.0.0-incubating
Since HAWQ is based on Parquet format version 1.0, which does not support dictionary page, and hawq register may register Parquet format version 2.0 data into HAWQ, we should throw error if finding unsupported page for column. Reproduce Steps: 1. In Hive, create a table and insert into 8 records: {code} (hive> create table tt (i int, > fname varchar(100), > title varchar(100), > salary double > ) > STORED AS PARQUET; OK Time taken: 0.029 seconds hive> insert into tt values (5, 'OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW', 'Sales', 80282.54), > (7, 'UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE', 'Engineer', 10206.65), > (4, 'PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ', 'Director', 63691.23), > (9, 'CTDCDYRURBZMBLNWHQNOQCYFFVULOP', 'Engineer', 63867.44), > (10, 'WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK', 'Sales', 97720.08); WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = malili_20170228173956_f370414c-ddc8-4e6d-99e9-7c1fa1f678d1 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Job running in-process (local Hadoop) 2017-02-28 17:39:58,713 Stage-1 map = 100%, reduce = 0% Ended Job = job_local2046305831_0004 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://127.0.0.1:8020/user/hive/warehouse/tt/.hive-staging_hive_2017-02-28_17-39-56_806_3518057455919651199-1/-ext-10000 Loading data to table default.tt MapReduce Jobs Launched: Stage-Stage-1: HDFS Read: 3945 HDFS Write: 4226 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 1.975 seconds hive> select * from tt; OK 5 OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW Sales 80282.54 7 UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE Engineer 10206.65 4 PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ Director 63691.23 9 CTDCDYRURBZMBLNWHQNOQCYFFVULOP Engineer 63867.44 10 WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK Sales 97720.08 Time taken: 0.056 seconds, Fetched: 5 row(s) {code} 2. Create table in HAWQ {code} CREATE TABLE public.tt (i int, fname varchar(100), title varchar(100), salary float8) WITH (appendonly=true,orientation=parquet); {code} 3. run hawq register {code} malilis-MacBook-Pro:Hawq_register malili$ hawq register -d postgres -f hdfs://localhost:8020/user/hive/warehouse/tt tt 20170228:17:40:25:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-try to connect database localhost:5432 postgres 20170228:17:40:33:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-New file(s) to be registered: ['hdfs://localhost:8020/user/hive/warehouse/tt/000000_0'] hdfscmd: "hadoop fs -mv hdfs://localhost:8020/user/hive/warehouse/tt/000000_0 hdfs://localhost:8020/hawq_default/16385/16387/49281/1" 20170228:17:40:41:090499 hawqregister:malilis-MacBook-Pro:malili-[INFO]:-Hawq Register Succeed. {code} 4. select from hawq {code} postgres=# select * from tt; i | fname | title | salary ----+--------------------------------+-------+---------- 5 | OYLNUQSQIGWDWBKMDQNYUGYXOBDFGW | | 80282.54 7 | UKIPCBGKHDNEEXQHOFGKKFIZGLFNHE | | 10206.65 4 | PTPIRDISZNTWNFRNBPCUKWXYFGSRBQ | | 63691.23 9 | CTDCDYRURBZMBLNWHQNOQCYFFVULOP | | 63867.44 10 | WZQGZJEEVDKOKTPRFKLVCBSBIYTEDK | | 97720.08 {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)