Hi Dan, Here's my enviroment:
CDH Version: 5.5.0-1.cdh5.5.0.p0.8 Kudu Version: 0.7.1-1.kudu0.7.1.p0.36 Steps to reproduce: *1. create the kudu table:* CREATE TABLE t1 ( id bigint ) TBLPROPERTIES( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler', 'kudu.table_name' = 't1', 'kudu.master_addresses' = 'master1:7051,master2:7051', 'kudu.key_columns' = 'id', 'kudu.num_tablet_replicas' = '5' ) *2. insert some values:* insert into t1 values (1),(2),(3),(4),(5); *3. start the spark-shell* $ spark-shell --jars lib/interface-annotations-0.7.1.jar,lib/kudu-client-0.7.1.jar,lib/kudu-mapreduce-0.7.1.jar,lib/kudu-spark-0.7.1.jar > sqlContext.read .format("org.kududb.spark") .options(Map("kudu.table" -> "t1", "kudu.master" -> "master1:7051,master2:7051")) .load() .registerTempTable("t1") > sqlContext.sql("select id from t1").count *4. exit spark-shell with Cltrl-D* > Ctrl-D when the spark-shell is shutting down, finally it shows: 16/03/15 11:48:17 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down. then the process hangs for ever until Ctrl-C is pressed. I don't have to do cleanup for sqlContext manually, right? the stack dump is attached. On Tue, Mar 15, 2016 at 9:56 AM, Dan Burkert <d...@cloudera.com> wrote: > Hi Darren, > > I think the thread dump would be helpful. We have a very similar test in > the repository, and we haven't had any problems with that. What > environment are you running the job in? > > - Dan > > On Mon, Mar 14, 2016 at 8:20 AM, Darren Hoo <darren....@gmail.com> wrote: > >> I use sqlContext to register the kudu table >> >> sqlContext.read >> .format("org.kududb.spark") >> .options(Map("kudu.table" -> table, "kudu.master" -> kuduMaster)) >> .load() >> .registerTempTable(table) >> >> then do something query and processing >> >> sqlContext.sql("...") >> >> >> but after sc.stop() is called, the spark driver never exit: >> >> 16/03/14 22:54:51 INFO DAGScheduler: Stopping DAGScheduler >> 16/03/14 22:54:51 INFO YarnClientSchedulerBackend: Shutting down all >> executors >> 16/03/14 22:54:51 INFO YarnClientSchedulerBackend: Interrupting monitor >> thread >> 16/03/14 22:54:51 INFO YarnClientSchedulerBackend: Asking each >> executor to shut down >> 16/03/14 22:54:51 INFO YarnClientSchedulerBackend: Stopped >> 16/03/14 22:54:51 INFO MapOutputTrackerMasterEndpoint: >> MapOutputTrackerMasterEndpoint stopped! >> 16/03/14 22:54:51 INFO MemoryStore: MemoryStore cleared >> 16/03/14 22:54:51 INFO BlockManager: BlockManager stopped >> 16/03/14 22:54:51 INFO BlockManagerMaster: BlockManagerMaster stopped >> 16/03/14 22:54:51 INFO SparkContext: Successfully stopped SparkContext >> 16/03/14 22:54:51 INFO >> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: >> OutputCommitCoordinator stopped! >> 16/03/14 22:54:51 INFO RemoteActorRefProvider$RemotingTerminator: >> Shutting down remote daemon. >> 16/03/14 22:54:51 INFO RemoteActorRefProvider$RemotingTerminator: >> Remote daemon shut down; proceeding with flushing remote transports. >> 16/03/14 22:54:51 INFO Remoting: Remoting shut down >> 16/03/14 22:54:51 INFO RemoteActorRefProvider$RemotingTerminator: >> Remoting shut down. >> >> then it stuck here forever >> >> PS: I use spark on yarn client mode, the problem only occurs when I >> use kudu-spark >> >> I have the thread dump about 7k after gzipped, I can post here if asked. >> > >
stacks.txt.gz
Description: GNU Zip compressed data