Hi Wei,

Have you searched JIRA for issues that relate to the Phoenix-Hive integration? There have been a few in the recent past around invalid queries being generated, especially around column names.

On 4/14/18 7:39 PM, Lu Wei wrote:
## Version:

phoenix: 4.13.2-cdh5.11.2

hive: 1.1.0-cdh5.11.2


There is an ColumnNotFoundException when joining a hive internal table with an Phoenix external table.



## Table1: phoenix external table "ext_tmp":

+-----------+------------+--------------------+--+

| col_name  | data_type  |      comment       |

+-----------+------------+--------------------+--+

| cola      | string     | from deserializer  |

| colb      | string     | from deserializer  |

+-----------+------------+--------------------+--+


### Backend Phoenix table "TMP":

select * from TMP;

*+-------+------------+*

*| **cola** |**colb** |*

*+-------+------------+*

*| *a * | *aaaa * |*

*| *b * | *bbaaaa * |*

*| *ccc * | *cccbbaaaa* |*

*+-------+------------+*

*
*

### Hive external table creation statement:


+----------------------------------------------------+--+

|                   createtab_stmt                   |

+----------------------------------------------------+--+

| CREATE EXTERNAL TABLE `ext_tmp`(                   |

|   `cola` string COMMENT 'from deserializer',       |

|   `colb` string COMMENT 'from deserializer')       |

| ROW FORMAT SERDE                                   |

|   'org.apache.phoenix.hive.PhoenixSerDe'           |

| STORED BY                                          |

|   'org.apache.phoenix.hive.PhoenixStorageHandler'  |

| WITH SERDEPROPERTIES (                             |

|   'serialization.format'='1')                      |

| LOCATION                                           |

|   'hdfs://st:8020/data/user/hive/warehouse/ext_tmp'|

| TBLPROPERTIES (                                    |

|   'phoenix.column.mapping'='cola:cola,colb:colb',  |

|   'phoenix.rowkeys'='cola',                        |

|   'phoenix.table.name'='tmp',                      |

|   'phoenix.zookeeper.client.port'='2181',          |

|   'phoenix.zookeeper.quorum'='st1,st2,st3',        |

|   'phoenix.zookeeper.znode.parent'='/hbase',       |

|   'transient_lastDdlTime'='1523607352')            |

+----------------------------------------------------+--+



## Table2: hive internal table "native1":

+-----------+------------+----------+--+

| col_name  | data_type  | comment  |

+-----------+------------+----------+--+

| cola      | string     |          |

| colb      | string     |          |

+-----------+------------+----------+--+



## When join the two tables:


select *  from native1  join  ext_tmp t on native1.cola= t.cola;



Exception:

org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined column. columnName=TMP


Detailed exception as below. There is an empty column "" as readcolumn name, which is not exsit in Pheonix table at all. so the phoenix query will never be correct: select /*+ NO_CACHE  */ "","cola","colb" from tmp where "cola" is not null


Any thoughts?



---------

2018-04-14 21:13:40,923 INFO  org.apache.hadoop.hive.ql.io.HiveInputFormat: [HiveServer2-Background-Pool: Thread-304]: hive.io.file.readcolumn.ids=

2018-04-14 21:13:40,923 INFO  org.apache.hadoop.hive.ql.io.HiveInputFormat: [HiveServer2-Background-Pool: Thread-304]: hive.io.file.readcolumn.names=,cola,colb

2018-04-14 21:13:40,923 INFO  org.apache.hadoop.hive.ql.io.HiveInputFormat: [HiveServer2-Background-Pool: Thread-304]: Generating splits

2018-04-14 21:13:40,924 INFO  org.apache.phoenix.hive.query.PhoenixQueryBuilder: [HiveServer2-Background-Pool: Thread-304]: Input query : select /*+ NO_CACHE  */ "","cola","colb" from tmp where "cola" is not null

2018-04-14 21:13:40,932 ERROR org.apache.phoenix.hive.mapreduce.PhoenixInputFormat: [HiveServer2-Background-Pool: Thread-304]: Failed to get the query plan with error [ERROR 504 (42703): Undefined column. columnName=TMP]



Thanks,

Wei

Reply via email to