Peikai Zheng created IMPALA-7627: ------------------------------------ Summary: Parallel the fetching permission process Key: IMPALA-7627 URL: https://issues.apache.org/jira/browse/IMPALA-7627 Project: IMPALA Issue Type: Improvement Reporter: Peikai Zheng
There are three phases when the Catalogd loading the metadata of a table. Firstly, the Catalogd fetches the metadata from Hive metastore; Then, the Catalogd fetches the permission of each partition from HDFS NameNode; Finally, the Catalogd loads the file descriptor from HDFS NameNode. According to my test result: ||Average Time(GetFileInfoThread=10) || phase 1 || phase 2 || phase 3|| |idm.sauron_message|9.9917115|459.2106944|95.0179163| |default.revenue_enriched|12.3377474|111.2969046|40.827472| |default.upp_raw_prod|1.5143162|50.0251426|12.6805323| |default.hit_to_beacon_playback_prod|1.4294509|49.7670539|18.3557858| |default.sitetracking_enriched|13.0003804|112.8746656|42.1824032| |default.player_custom_event|9.2618705|493.4865302|116.4986184| |default.revenue_day_est|57.9116561|106.5028664|24.005822| The majority of the time occupied by the second phase. So, I suggest to parallel the second phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)