<https://stackoverflow.com/posts/59977690/timeline>
Hi, I am trying to do 1000s of update parquet partition operations on different hive tables parallely from my client application. I am using sparksql with hive enabled in my application to submit hive query. spark.sql(" ALTER TABLE mytable PARTITION (a=3, b=3) SET LOCATION '/newdata/mytable/a=3/b=3/part.parquet") I can see all the queries are submitted via different threads from my fork-join pool. i couldn't scale this operation however way i tweak the thread pool. Then I started observing hive metastore logs and I see that only thread is making all writes. 2020-01-29T16:27:15,638 INFO [pool-6-thread-163] metastore.HiveMetaStore: 163: source:10.250.70.14 get_table : db=mydb tbl=mytable1 2020-01-29T16:27:15,638 INFO [pool-6-thread-163] HiveMetaStore.audit: ugi=mycomp ip=10.250.70.14 cmd=source:10.250.70.14 get_table : db=mydb tbl=mytable1 2020-01-29T16:27:15,653 INFO [pool-6-thread-163] metastore.HiveMetaStore: 163: source:10.250.70.14 get_database: mydb 2020-01-29T16:27:15,653 INFO [pool-6-thread-163] HiveMetaStore.audit: ugi=mycomp ip=10.250.70.14 cmd=source:10.250.70.14 get_database: mydb 2020-01-29T16:27:15,655 INFO [pool-6-thread-163] metastore.HiveMetaStore: 163: source:10.250.70.14 get_table : db=mydb tbl=mytable2 2020-01-29T16:27:15,656 INFO [pool-6-thread-163] HiveMetaStore.audit: ugi=mycomp ip=10.250.70.14 cmd=source:10.250.70.14 get_table : db=mydb tbl=mytable2 2020-01-29T16:27:15,670 INFO [pool-6-thread-163] metastore.HiveMetaStore: 163: source:10.250.70.14 get_database: mydb 2020-01-29T16:27:15,670 INFO [pool-6-thread-163] HiveMetaStore.audit: ugi=mycomp ip=10.250.70.14 cmd=source:10.250.70.14 get_database: mydb 2020-01-29T16:27:15,672 INFO [pool-6-thread-163] metastore.HiveMetaStore: 163: source:10.250.70.14 get_table : db=mydb tbl=mytable3 2020-01-29T16:27:15,672 INFO [pool-6-thread-163] HiveMetaStore.audit: ugi=mycomp ip=10.250.70.14 cmd=source:10.250.70.14 get_table : db=mydb tbl=mytable3 ALl actions are performed by only one thread pool-6-thread-163 I have scanned 100s of lines and it just same thread. I don't see much log in hiverserver.log file. I see in hive document following default values: hive.metastore.server.min.threads Default Value: 200 hive.metastore.server.max.threads Default Value: 100000 which should be good enough but why just one thread doing all the work? Is it bound to consumer IP ? which would make sense as I am submitting all jobs from single machine. Am I missing any configuration or is there any issue with this approach from my application side? Thanks, Nirav -- <http://www.xactlycorp.com> <https://www.xactlyunleashed.com/event/a022327e-063e-4089-bfc2-e68b1773374c/summary?5S%2CM3%2Ca022327e-063e-4089-bfc2-e68b1773374c=&utm_campaign=event_unleashed2020&utm_content=cost&utm_medium=signature&utm_source=email>