[ https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach reassigned HIVE-12285: ------------------------------------- Assignee: Carl Steinbach (was: Elliot West) > Add locking to HCatClient > ------------------------- > > Key: HIVE-12285 > URL: https://issues.apache.org/jira/browse/HIVE-12285 > Project: Hive > Issue Type: Improvement > Components: HCatalog > Affects Versions: 2.0.0 > Reporter: Elliot West > Assignee: Carl Steinbach > Labels: concurrency, hcatalog, lock, locking, locks > > With the introduction of a concurrency model (HIVE-1293) Hive uses locks to > coordinate access and updates to both table data and metadata. Within the > Hive CLI such lock management is seamless. However, Hive provides additional > APIs that permit interaction with data repositories, namely the HCatalog > APIs. Currently, operations implemented by this API do not participate with > Hive's locking scheme. Furthermore, access to the locking mechanisms is not > exposed by the APIs (as is the case with the Metastore Thrift API) and so > users are not able to explicitly interact with locks either. This has created > a less than ideal situation where users of the APIs have no choice but to > manipulate these data repositories outside of the command of Hive's lock > management, potentially resulting in situations where data inconsistencies > can occur both for external processes using the API and for queries executing > within Hive. > h3. Scope of work > This ticket is concerned with sections of the HCatalog API that deal with DDL > type operations using the metastore, not with those whose purpose is to > read/write table data. A separate issue already exists for adding locking to > HCat readers and writers (HIVE-6207). > h3. Proposed work > The following work items would serve as a minimum deliverable that would both > allow API users to effectively work with locks: > * Comprehensively document on the wiki the locks required for various Hive > operations. At a minimum this should cover all operations exposed by > {{HCatClient}}. The [Locking design > document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be > used as a starting point or perhaps updated. > * Implement methods and types in the {{HCatClient}} API that allow users to > manipulate Hive locks. For the most part I'd expect these to delegate to the > metastore API implementations: > ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}} > ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}} > ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}} > ** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}- > ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}} > ** {{org.apache.hadoop.hive.metastore.api.LockComponent}} > ** {{org.apache.hadoop.hive.metastore.api.LockRequest}} > ** {{org.apache.hadoop.hive.metastore.api.LockResponse}} > ** {{org.apache.hadoop.hive.metastore.api.LockLevel}} > ** {{org.apache.hadoop.hive.metastore.api.LockType}} > ** {{org.apache.hadoop.hive.metastore.api.LockState}} > ** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}- > h3. Additional proposals > Explicit lock management should be fairly simple to add to {{HCatClient}}, > however it puts the onus on the API user to correctly understand and > implement code that uses lock in an appropriate manner. Failure to do so may > have undesirable consequences. With a simpler user model the operations > exposed on the API would automatically acquire and release the locks that > they need. This might work well for small numbers of operations, but not > perhaps for large sequences of invocations. (Do we need to worry about this > though as the API methods usually accept batches?). Additionally tasks such > as heartbeat management could also be handled implicitly for long running > sets of operations. With these concerns in mind it may also be beneficial to > deliver some of the following: > * A means to automatically acquire/release appropriate locks for > {{HCatClient}} operations. > * A component that maintains a lock heartbeat from the client. > * A strategy for switching between manual/automatic lock management, > analogous to SQL's {{autocommit}} for transactions. > An API for lock and heartbeat management already exists in the HCatalog > Mutation API (see: > {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely > make sense to refactor either this code and/or code that uses it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)