Bhushan Mandhani created HIVE-2989:
--------------------------------------

             Summary: Adding Table Links to Hive
                 Key: HIVE-2989
                 URL: https://issues.apache.org/jira/browse/HIVE-2989
             Project: Hive
          Issue Type: Improvement
          Components: Metastore, Query Processor, Security
            Reporter: Bhushan Mandhani
            Assignee: Bhushan Mandhani
             Fix For: 0.10.0


This will add Table Links to Hive. This will be an alternate mechanism for a 
user to access tables and data in a database that is different from the one he 
is associated with. This feature can be used to provide access control (if 
access to databasename.tablename in queries and "use database X" is turned off 
in conjunction).

If db X wants to access one or more partitions from table T in db Y, the user 
will issue:
CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
New partitions added to T will automatically be added to the link as well and 
become available to X. However, if the link is specified to be static, that 
will not be the case. The X user will then have to explicitly import each 
partition of T that he needs. The command above will not actually make any 
existing partitions of T available to X. Instead, we provide the following 
command to add an existing partition to a link:
ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
The user will need to execute the above for each existing partition that needs 
to be imported. For future partitions, Hive will take care of this. An imported 
partition can be dropped from a link using a similar command. We just specify 
"DROP" instead of "ADD". For querying the linked table, the X user will refer 
to it as T@Y. Link Tables will only have read access and not be writable. The 
entire Table Link alongwith all its imported partitions can be dropped as 
follows:
DROP LINK TO T@Y

The above commands are purely MetaStore operations. The implementation will 
rely on replicating the entire partition metadata when a partition is added to 
a link.  For every link that is created, we will add a new row to table TBLS. 
The TBL_TYPE column will have a new kind of value "LINK_TABLE" (or 
"STATIC_LINK_TABLE" if the link has been specified as static). A new column 
LINK_TBL_ID will be added which will contain the id of the imported table. It 
will be NULL for all other table types including the regular managed tables. 
When a partition is added to a link, the new row in the table PARTITIONS will 
point to the LINK_TABLE in the same database  and not the master table in the 
other database. We will replicate all the metadata for this partition from the 
master database. The advantage of this approach is that fewer changes will be 
needed in query processing and DDL for LINK_TABLEs. Also, commands like "SHOW 
TABLES" and "SHOW PARTITIONS" will work as expected for LINK_TABLEs too. Of 
course, even though the metadata is not shared, the underlying data on disk is 
still shared. Hive still needs to know that when dropping a partition which 
belongs to a LINK_TABLE, it should not drop the underlying data from HDFS. 
Views and external tables cannot be imported from one database to another.
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to