[jira] [Updated] (HIVE-20313) consider making ROW__ID a 1st class object

Eugene Koifman (JIRA) Fri, 03 Aug 2018 16:12:37 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-20313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Eugene Koifman updated HIVE-20313:
----------------------------------
    Description: 
ROW_ID, which is a struct that represents a unique row ID within a partition of 
a full CRUD transactional table is currently modeled as a {{VirtualColumn}}.  
Acid metadata columns from which ROW_ID is built are actually stored in the 
data file.  

There is no end to special handling of acid metadata columns in the code to 
make this work.

Perhaps a better approach is to add struct column to an acid table at creation 
time and make it a 1st class citizen visible in the metastore.  'select 
count(*) ....' would need special handling to remove it.  There may need to be 
a way to make these columns read-only.

For data added via Load Data, Add Partition, etc (i.e. original files in a CRUD 
table), acid reader would have fill in the values as it does today.

This would make schema evolution, PPD, projection pruning work seamlessly.
This should also make adding formats other than ORC in full CRUD tables easy.

This will likely be painful but should be investigated.



  was:
ROW__ID, which is a struct that represents a unique row ID within a partition 
of a full CRUD transactional table is currently modeled as a {{VirtualColumn}}. 
 Acid metadata columns from which ROW__ID is built are actually stored in the 
data file.  

There is no end to special handling of acid metadata columns in the code to 
make this work.

Perhaps a better approach is to add struct column to an acid table at creation 
time and make it a 1st class citizen visible in the metastore.  'select 
count(*) ....' would need special handling to remove it.  There may need to be 
a way to make these columns read-only.

For data added via Load Data, Add Partition, etc (i.e. original files in a CRUD 
table), acid reader would have fill in the values as it does today.

This would make schema evolution, PPD, projection pruning work seamlessly.
This should also make adding formats other than ORC in full CRUD tables easy.

This will likely be painful but should be investigated.




> consider making ROW__ID a 1st class object
> ------------------------------------------
>
>                 Key: HIVE-20313
>                 URL: https://issues.apache.org/jira/browse/HIVE-20313
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>    Affects Versions: 0.11.0
>            Reporter: Eugene Koifman
>            Priority: Major
>
> ROW_ID, which is a struct that represents a unique row ID within a partition 
> of a full CRUD transactional table is currently modeled as a 
> {{VirtualColumn}}.  Acid metadata columns from which ROW_ID is built are 
> actually stored in the data file.  
> There is no end to special handling of acid metadata columns in the code to 
> make this work.
> Perhaps a better approach is to add struct column to an acid table at 
> creation time and make it a 1st class citizen visible in the metastore.  
> 'select count(*) ....' would need special handling to remove it.  There may 
> need to be a way to make these columns read-only.
> For data added via Load Data, Add Partition, etc (i.e. original files in a 
> CRUD table), acid reader would have fill in the values as it does today.
> This would make schema evolution, PPD, projection pruning work seamlessly.
> This should also make adding formats other than ORC in full CRUD tables easy.
> This will likely be painful but should be investigated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-20313) consider making ROW__ID a 1st class object

Reply via email to