[ 
https://issues.apache.org/jira/browse/IMPALA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12487 started by Sai Hemanth Gantasala.
------------------------------------------------------
> Skip reloading file metadata for ALTER_TABLE events with trivial changes in 
> StorageDescriptor
> ---------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12487
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12487
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Quanlong Huang
>            Assignee: Sai Hemanth Gantasala
>            Priority: Critical
>         Attachments: ALTER_TABLE_event_with_SD_changes.png
>
>
> IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE 
> events. However, ALTER_TABLE events that have trivial changes in 
> StorageDescriptor are not handled in IMPALA-11534. Some of them can skip 
> reloading file metadata. The thrift defination of StorageDescriptor (not all 
> of the fields are related to file metadata):
> {code:java}
> // this object holds all the information about physical storage of the data 
> belonging to a table
> struct StorageDescriptor {
>   1: list<FieldSchema> cols,  // required (refer to types defined above)
>   2: string location,         // defaults to <warehouse loc>/<db 
> loc>/tablename
>   3: string inputFormat,      // SequenceFileInputFormat (binary) or 
> TextInputFormat`  or custom format
>   4: string outputFormat,     // SequenceFileOutputFormat (binary) or 
> IgnoreKeyTextOutputFormat or custom format
>   5: bool   compressed,       // compressed or not
>   6: i32    numBuckets,       // this must be specified if there are any 
> dimension columns
>   7: SerDeInfo    serdeInfo,  // serialization and deserialization information
>   8: list<string> bucketCols, // reducer grouping columns and clustering 
> columns and bucketing columns`
>   9: list<Order>  sortCols,   // sort order of the data in each bucket
>   10: map<string, string> parameters, // any user supplied key value hash
>   11: optional SkewedInfo skewedInfo, // skewed information
>   12: optional bool   storedAsSubDirectories       // stored as 
> subdirectories or not
> } {code}
> The attached screenshot is an example comparing the before and after Table 
> object of an ALTER_TABLE event that has trivial changes in StorageDescriptor. 
> It just clears the field of 'storedAsSubDirectories:false', and that field 
> defaults to be false. So actually makes no difference in the 
> StorageDescriptor.
> I think we can compare changes in the StorageDescriptor and only reload file 
> metadata if any of these changes:
>  * 'location'
>  * 'storedAsSubDirectories'
> Note that the default of 'storedAsSubDirectories' is false so removing 
> 'storedAsSubDirectories:false' is considered as unchanged.
> CC [~hemanth619], [~csringhofer] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to