[ https://issues.apache.org/jira/browse/IMPALA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-12487 started by Sai Hemanth Gantasala. ------------------------------------------------------ > Skip reloading file metadata for ALTER_TABLE events with trivial changes in > StorageDescriptor > --------------------------------------------------------------------------------------------- > > Key: IMPALA-12487 > URL: https://issues.apache.org/jira/browse/IMPALA-12487 > Project: IMPALA > Issue Type: Improvement > Reporter: Quanlong Huang > Assignee: Sai Hemanth Gantasala > Priority: Critical > Attachments: ALTER_TABLE_event_with_SD_changes.png > > > IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE > events. However, ALTER_TABLE events that have trivial changes in > StorageDescriptor are not handled in IMPALA-11534. Some of them can skip > reloading file metadata. The thrift defination of StorageDescriptor (not all > of the fields are related to file metadata): > {code:java} > // this object holds all the information about physical storage of the data > belonging to a table > struct StorageDescriptor { > 1: list<FieldSchema> cols, // required (refer to types defined above) > 2: string location, // defaults to <warehouse loc>/<db > loc>/tablename > 3: string inputFormat, // SequenceFileInputFormat (binary) or > TextInputFormat` or custom format > 4: string outputFormat, // SequenceFileOutputFormat (binary) or > IgnoreKeyTextOutputFormat or custom format > 5: bool compressed, // compressed or not > 6: i32 numBuckets, // this must be specified if there are any > dimension columns > 7: SerDeInfo serdeInfo, // serialization and deserialization information > 8: list<string> bucketCols, // reducer grouping columns and clustering > columns and bucketing columns` > 9: list<Order> sortCols, // sort order of the data in each bucket > 10: map<string, string> parameters, // any user supplied key value hash > 11: optional SkewedInfo skewedInfo, // skewed information > 12: optional bool storedAsSubDirectories // stored as > subdirectories or not > } {code} > The attached screenshot is an example comparing the before and after Table > object of an ALTER_TABLE event that has trivial changes in StorageDescriptor. > It just clears the field of 'storedAsSubDirectories:false', and that field > defaults to be false. So actually makes no difference in the > StorageDescriptor. > I think we can compare changes in the StorageDescriptor and only reload file > metadata if any of these changes: > * 'location' > * 'storedAsSubDirectories' > Note that the default of 'storedAsSubDirectories' is false so removing > 'storedAsSubDirectories:false' is considered as unchanged. > CC [~hemanth619], [~csringhofer] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org