[ https://issues.apache.org/jira/browse/IMPALA-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Jeges resolved IMPALA-3307. ---------------------------------- Resolution: Fixed Fix Version/s: Impala 3.1.0 IMPALA-3307: Add support for IANA time-zone db Impala currently uses two different libraries for timestamp manipulations: boost and glibc. Issues with boost: - Time-zone database is currently hard coded in timezone_db.cc. Impala admins cannot update it without upgrading Impala. - Time-zone database is flat, therefore can’t track year-to-year changes. - Time-zone database is not updated on a regular basis. Issues with glibc: - Uses /usr/share/zoneinfo/ database which could be out of sync on some of the nodes in the Impala cluster. - Uses the host system’s local time-zone. Different nodes in the Impala cluster might use a different local time-zone. - Conversion functions take a global lock, which causes severe performance degradation. In addition to the issues above, the fact that /usr/share/zoneinfo/ and the hard-coded boost time-zone database are both in use is a source of inconsistency in itself. This patch makes the following changes: - Instead of boost and glibc, impalad uses Google's CCTZ to implement time-zone conversions. - Introduces a new startup flag (--hdfs_zone_info_zip) to impalad to specify an HDFS/S3/ADLS path to a zip archive that contains the shared compiled IANA time-zone database. If the startup flag is set, impalad will use the specified time-zone database. Otherwise, impalad will use the default /usr/share/zoneinfo time-zone database. - Introduces a new startup flag (--hdfs_zone_alias_conf) to impalad to specify an HDFS/S3/ADLS path to a shared config file that contains definitions for non-standard time-zone aliases. - impalad reads the entire time-zone database into an in-memory map on startup for fast lookups. - The name of the coordinator node’s local time-zone is saved to the query context when preparing query execution. This time-zone is used whenever the current time-zone is referred afterwards in an execution node. - Adds a new ZipUtil class to extract files from a zip archive. The implementation is not vulnerable to Zip Slip. Cherry-picks: not for 2.x. Change-Id: I93c1fbffe81f067919706e30db0a34d0e58e7e77 Reviewed-on: http://gerrit.cloudera.org:8080/9986 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Attila Jeges <atti...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > add support for IANA time zone database > --------------------------------------- > > Key: IMPALA-3307 > URL: https://issues.apache.org/jira/browse/IMPALA-3307 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: impala 2.3 > Reporter: Marcell Szabo > Assignee: Attila Jeges > Priority: Major > Labels: supportability > Fix For: Impala 3.1.0 > > > Currently the time zones are hard coded timezone_db.cc and they do not take > into account that timezone definitions changed year to year (except for > Moscow CDH-19918). > I suggest moving timezone info into a separate config file, so that admins > can update if necessary, plus provide tools for updating it from well-known > sources. > 1) Define an impala-friendly file format for timezone data (preferably > human-editable as well, even more preferably a format that other similar > systems already use) > 2) Create tool to extract timezone data from the IANA tzdata database or > /usr/share/zoneinfo > into the format specified. > 3) File (path, hdfs path) should be part of configuration > 4) backends should load the tzinfo into a quick memory structure (quick > lookup by id + date) (maybe load/cache each time zone on demand, most of them > will never be used) > 5) all date functions should use this generic tzinfo from memory > regarding 2), similar tools: > http://www.oracle.com/technetwork/java/javase/tzupdater-readme-136440.html > http://dev.mysql.com/doc/refman/5.7/en/mysql-tzinfo-to-sql.html > regarding 3), some reasons to make this configurable, and making 2) a manual > step: > * tzinfo is not perfectly standardised, automatic solutions might fail on > some OSes > * tzinfo on different hosts might be out of sync. Good luck with debugging > such cases... > * we wouldn't want query results automagically/unexpectedly change on OS > upgrade > * we should give the admins the possibility to override / fine-tune tz data > if the applications require doing so. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org