Vihang Karajgaonkar created IMPALA-9139:
-------------------------------------------

             Summary: Invalidate metadata adds all the tables to background 
loading pool unnecessarily
                 Key: IMPALA-9139
                 URL: https://issues.apache.org/jira/browse/IMPALA-9139
             Project: IMPALA
          Issue Type: Bug
            Reporter: Vihang Karajgaonkar


I see the following code in the reset() method of CatalogServiceCatalog
{code:java}
      // Build a new DB cache, populate it, and replace the existing cache in 
one
      // step.
      Map<String, Db> newDbCache = new ConcurrentHashMap<String, Db>();
      List<TTableName> tblsToBackgroundLoad = new ArrayList<>();
      try (MetaStoreClient msClient = getMetaStoreClient()) {
        List<String> allDbs = msClient.getHiveClient().getAllDatabases();
        int numComplete = 0;
        for (String dbName: allDbs) {
          if (isBlacklistedDb(dbName)) {
            LOG.info("skip blacklisted db: " + dbName);
            continue;
          }
          String annotation = String.format("invalidating metadata - %s/%s dbs 
complete",
              numComplete++, allDbs.size());
          try (ThreadNameAnnotator tna = new ThreadNameAnnotator(annotation)) {
            dbName = dbName.toLowerCase();
            Db oldDb = oldDbCache.get(dbName);
            Pair<Db, List<TTableName>> invalidatedDb = invalidateDb(msClient,
                dbName, oldDb);
            if (invalidatedDb == null) continue;
            newDbCache.put(dbName, invalidatedDb.first);
            tblsToBackgroundLoad.addAll(invalidatedDb.second);
          }
        }
      }
      dbCache_.set(newDbCache);

      // Identify any deleted databases and add them to the delta log.
      Set<String> oldDbNames = oldDbCache.keySet();
      Set<String> newDbNames = newDbCache.keySet();
      oldDbNames.removeAll(newDbNames);
      for (String dbName: oldDbNames) {
        Db removedDb = oldDbCache.get(dbName);
        updateDeleteLog(removedDb);
      }

      // Submit tables for background loading.
      for (TTableName tblName: tblsToBackgroundLoad) {
        tableLoadingMgr_.backgroundLoad(tblName);
      }
{code}

If you notice above, the tables are being added to the backgroundLoad with 
checking the flag {{loadInBackground_}}. This means that even if the flag is 
unset, after we issue a invalidate metadata command, all the tables in the 
system are being loaded in the background. Note that this code is only loading 
the tables, not adding the loaded tables to the catalog which is good otherwise 
the memory footprint of catalog would be increased after every invalidate 
metadata command.

This bug has 2 implications:
1. We are obviously wasting a lot of cpu cycles without getting anything out of 
it.
2. The more subtle side-effect is that this would fill up the 
{{tableLoadingDeque_}}. This means any other background load task will take a 
longer duration to complete.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to