bharath v created IMPALA-7961:
---------------------------------

             Summary: Concurrent catalog heavy workloads can cause queries with 
SYNC_DDL to fail fast
                 Key: IMPALA-7961
                 URL: https://issues.apache.org/jira/browse/IMPALA-7961
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: bharath v


When catalog server is under heavy load with concurrent updates to objects, 
queries with SYNC_DDL can fail with the following message.

*User facing error message:*

{noformat}
ERROR: CatalogException: Couldn't retrieve the catalog topic version for the 
SYNC_DDL operation after 3 attempts.The operation has been successfully 
executed but its effects may have not been broadcast to all the coordinators.
{noformat}

*Exception from the catalog server log:*

{noformat}
I1031 00:00:49.168761 1127039 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 1088
I1031 00:00:49.168824 1125528 CatalogServiceCatalog.java:1903] Operation using 
SYNC_DDL is waiting for catalog topic version: 236535. Time to identify topic 
version (msec): 12625
I1031 00:00:49.168851 1131986 jni-util.cc:230] 
org.apache.impala.catalog.CatalogException: Couldn't retrieve the catalog topic 
version for the SYNC_DDL operation after 3 attempts.The operation has been 
successfully executed but its effects may have not been broadcast to all the 
coordinators.
        at 
org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:1891)
        at 
org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:336)
        at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:146)
::::
{noformat}

*What this means*

This means that the Catalog operation is actually successful (the change has 
been committed to HMS and Catalog server cache) but the Catalog server noticed 
that it is taking longer than expected time for it to broadcast the changes 
(for whatever reason) and instead of hanging in there, it fails fast. The 
coordinators are expected to eventually sync up in the background.

*Problem*

- This violates the contract of the SYNC_DDL query option since the query 
returns early.
- This is a behavioral regression from pre IMPALA-5058 state where the queries 
would wait forever for SYNC_DDL based changes to propagate.

*Notes*

- Usual suspect here is heavily concurrent catalog operations with long running 
DDLs.
- Introduced by IMPALA-5058

Please refer to the jira comment for technical explanation as to why this is 
happening. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to