Hi Wei,
Have you searched JIRA for issues that relate to the Phoenix-Hive
integration? There have been a few in the recent past around invalid
queries being generated, especially around column names.
On 4/14/18 7:39 PM, Lu Wei wrote:
## Version:
phoenix: 4.13.2-cdh5.11.2
hive: 1.1.0-cdh5.11.2
There is an ColumnNotFoundException when joining a hive internal table
with an Phoenix external table.
## Table1: phoenix external table "ext_tmp":
+-----------+------------+--------------------+--+
| col_name | data_type | comment |
+-----------+------------+--------------------+--+
| cola | string | from deserializer |
| colb | string | from deserializer |
+-----------+------------+--------------------+--+
### Backend Phoenix table "TMP":
select * from TMP;
*+-------+------------+*
*| **cola** |**colb** |*
*+-------+------------+*
*| *a * | *aaaa * |*
*| *b * | *bbaaaa * |*
*| *ccc * | *cccbbaaaa* |*
*+-------+------------+*
*
*
### Hive external table creation statement:
+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE EXTERNAL TABLE `ext_tmp`( |
| `cola` string COMMENT 'from deserializer', |
| `colb` string COMMENT 'from deserializer') |
| ROW FORMAT SERDE |
| 'org.apache.phoenix.hive.PhoenixSerDe' |
| STORED BY |
| 'org.apache.phoenix.hive.PhoenixStorageHandler' |
| WITH SERDEPROPERTIES ( |
| 'serialization.format'='1') |
| LOCATION |
| 'hdfs://st:8020/data/user/hive/warehouse/ext_tmp'|
| TBLPROPERTIES ( |
| 'phoenix.column.mapping'='cola:cola,colb:colb', |
| 'phoenix.rowkeys'='cola', |
| 'phoenix.table.name'='tmp', |
| 'phoenix.zookeeper.client.port'='2181', |
| 'phoenix.zookeeper.quorum'='st1,st2,st3', |
| 'phoenix.zookeeper.znode.parent'='/hbase', |
| 'transient_lastDdlTime'='1523607352') |
+----------------------------------------------------+--+
## Table2: hive internal table "native1":
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| cola | string | |
| colb | string | |
+-----------+------------+----------+--+
## When join the two tables:
select * from native1 join ext_tmp t on native1.cola= t.cola;
Exception:
org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703):
Undefined column. columnName=TMP
Detailed exception as below. There is an empty column "" as readcolumn
name, which is not exsit in Pheonix table at all. so the phoenix query
will never be correct: select /*+ NO_CACHE */ "","cola","colb" from tmp
where "cola" is not null
Any thoughts?
---------
2018-04-14 21:13:40,923 INFO
org.apache.hadoop.hive.ql.io.HiveInputFormat:
[HiveServer2-Background-Pool: Thread-304]: hive.io.file.readcolumn.ids=
2018-04-14 21:13:40,923 INFO
org.apache.hadoop.hive.ql.io.HiveInputFormat:
[HiveServer2-Background-Pool: Thread-304]:
hive.io.file.readcolumn.names=,cola,colb
2018-04-14 21:13:40,923 INFO
org.apache.hadoop.hive.ql.io.HiveInputFormat:
[HiveServer2-Background-Pool: Thread-304]: Generating splits
2018-04-14 21:13:40,924 INFO
org.apache.phoenix.hive.query.PhoenixQueryBuilder:
[HiveServer2-Background-Pool: Thread-304]: Input query : select /*+
NO_CACHE */ "","cola","colb" from tmp where "cola" is not null
2018-04-14 21:13:40,932 ERROR
org.apache.phoenix.hive.mapreduce.PhoenixInputFormat:
[HiveServer2-Background-Pool: Thread-304]: Failed to get the query plan
with error [ERROR 504 (42703): Undefined column. columnName=TMP]
Thanks,
Wei