[ 
https://issues.apache.org/jira/browse/IMPALA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-886:
---------------------------------
    Labels: catalog-server hbase usability  (was: catalog-server usability)

> Always display HBase cols in same order as CREATE TABLE statement
> -----------------------------------------------------------------
>
>                 Key: IMPALA-886
>                 URL: https://issues.apache.org/jira/browse/IMPALA-886
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 1.3
>            Reporter: John Russell
>            Priority: Minor
>              Labels: catalog-server, hbase, usability
>
> I noticed a discrepancy with Hive, in how Impala handles column order for 
> HBase tables.
> I think it would be preferable to use the same behavior as Hive, otherwise 
> life becomes
> more complicated for anyone doing INSERT or SELECT * with an HBase table 
> through Impala.
> (And I have to add caveats and usage notes in the docs.)
> Repro:
> In HBase shell, create a table with a single column family. I think most 
> Impala tests use 1 column family per column, where you won't notice this 
> behavior.
> hbase(main):008:0> create 'sample_data_fast','cols'
> 0 row(s) in 71.8750 seconds
> In Hive shell, create a mapping table. Notice how DESCRIBE repeats back the 
> columns in the same order as in CREATE TABLE.
> hive> create external table sample_data_fast (id string, val int, zfill 
> string, name string, assertion boolean)
>     > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     > WITH SERDEPROPERTIES (
>     > "hbase.columns.mapping" =
>     > ":key,cols:val,cols:zfill,cols:name,cols:assertion")
>     > TBLPROPERTIES("hbase.table.name" = "sample_data_fast")
>     > ;
> OK
> Time taken: 1.7 seconds
> hive> desc sample_data_fast;
> OK
> id  string  from deserializer
> val int from deserializer
> zfill string  from deserializer
> name  string  from deserializer
> assertion boolean from deserializer
> Time taken: 0.302 seconds
> Now try the same DESCRIBE in impala-shell. The key column (id) is listed 
> first. Then all the other columns, part of the same column family, are listed 
> in alphabetical order rather than the order from CREATE TABLE:
> [localhost:21000] > desc sample_data_fast;
> Query: describe sample_data_fast
> +-----------+---------+---------+
> | name      | type    | comment |
> +-----------+---------+---------+
> | id        | string  |         |
> | assertion | boolean |         |
> | name      | string  |         |
> | val       | int     |         |
> | zfill     | string  |         |
> +-----------+---------+---------+
> Returned 5 row(s) in 0.02s
> Thus if you already had Hive code that was doing SELECT * from an HBase table 
> like this, you would get a different result set (different column order) in 
> Impala.
> If you tried to copy from an HDFS table via 'INSERT INTO hbase_table SELECT * 
> FROM hdfs_table', you would get an error because the columns don't match. If 
> you made a separate column family for each column, the discrepancy is masked 
> because you need more than one column per column family to experience the 
> alphabetical ordering.
> Since Hive is preserving the column order, the relevant info must be there in 
> the metastore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to