Hi, I'm trying to write a phoenix-spark sample job in java to read few colums from hbase and write it back to hbase after some manipulation. while running this job I'm getting exception saying "org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set", thou I had set the outputformat as PhoenixoutputFormat, please find the code and exception attached .The command to submit the job is mentioned below, any leads would be appreciated.
Spark job submit command: spark-submit --class bulk_test.PhoenixSparkJob --driver-class-path /home/cloudera/Desktop/phoenix-client-4.5.2-1.clabs_phoenix1.2.0.p0.774.jar --master local myjar.jar Regard's Ravi Kumar B Mob: +91 9591144511
[cloudera@quickstart Desktop]$ spark-submit --class bulk_test.PhoenixSparkJob --driver-class-path /home/cloudera/Desktop/phoenix-client-4.5.2-1.clabs_phoenix1.2.0.p0.774.jar --master local myjar.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/cloudera/Desktop/phoenix-client-4.5.2-1.clabs_phoenix1.2.0.p0.774.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 17/01/31 22:09:23 INFO spark.SparkContext: Running Spark version 1.5.0-cdh5.5.0 17/01/31 22:09:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/01/31 22:09:24 INFO spark.SecurityManager: Changing view acls to: cloudera 17/01/31 22:09:24 INFO spark.SecurityManager: Changing modify acls to: cloudera 17/01/31 22:09:24 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera) 17/01/31 22:09:25 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/01/31 22:09:25 INFO Remoting: Starting remoting 17/01/31 22:09:26 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:33971] 17/01/31 22:09:26 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:33971] 17/01/31 22:09:26 INFO util.Utils: Successfully started service 'sparkDriver' on port 33971. 17/01/31 22:09:26 INFO spark.SparkEnv: Registering MapOutputTracker 17/01/31 22:09:26 INFO spark.SparkEnv: Registering BlockManagerMaster 17/01/31 22:09:26 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-ccdf798f-4726-4247-b9f7-1ef155c8d636 17/01/31 22:09:26 INFO storage.MemoryStore: MemoryStore started with capacity 534.5 MB 17/01/31 22:09:26 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-bbbbb4be-0b55-42d7-ac1d-2243a7cf94cf/httpd-e8be7e18-854d-4663-b76e-44a35cf1798f 17/01/31 22:09:26 INFO spark.HttpServer: Starting HTTP Server 17/01/31 22:09:26 INFO server.Server: jetty-8.y.z-SNAPSHOT 17/01/31 22:09:26 INFO server.AbstractConnector: Started [email protected]:57868 17/01/31 22:09:26 INFO util.Utils: Successfully started service 'HTTP file server' on port 57868. 17/01/31 22:09:26 INFO spark.SparkEnv: Registering OutputCommitCoordinator 17/01/31 22:09:37 INFO server.Server: jetty-8.y.z-SNAPSHOT 17/01/31 22:09:37 INFO server.AbstractConnector: Started [email protected]:4040 17/01/31 22:09:37 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 17/01/31 22:09:37 INFO ui.SparkUI: Started SparkUI at http://localhost:4040 17/01/31 22:09:37 INFO spark.SparkContext: Added JAR file:/home/cloudera/Desktop/myjar.jar at http://localhost:57868/jars/myjar.jar with timestamp 1485929377419 17/01/31 22:09:37 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 17/01/31 22:09:37 INFO executor.Executor: Starting executor ID driver on host localhost 17/01/31 22:09:37 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51627. 17/01/31 22:09:37 INFO netty.NettyBlockTransferService: Server created on 51627 17/01/31 22:09:37 INFO storage.BlockManagerMaster: Trying to register BlockManager 17/01/31 22:09:37 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:51627 with 534.5 MB RAM, BlockManagerId(driver, localhost, 51627) 17/01/31 22:09:37 INFO storage.BlockManagerMaster: Registered BlockManager Query::SELECT STOCK_NAME,RECORDING_YEAR,RECORDINGS_QUARTER FROM STOCKS 17/01/31 22:09:39 INFO storage.MemoryStore: ensureFreeSpace(141696) called with curMem=0, maxMem=560497950 17/01/31 22:09:39 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 138.4 KB, free 534.4 MB) 17/01/31 22:09:39 INFO storage.MemoryStore: ensureFreeSpace(13489) called with curMem=141696, maxMem=560497950 17/01/31 22:09:39 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.2 KB, free 534.4 MB) 17/01/31 22:09:39 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:51627 (size: 13.2 KB, free: 534.5 MB) 17/01/31 22:09:39 INFO spark.SparkContext: Created broadcast 0 from newAPIHadoopRDD at PhoenixSparkJob.java:62 Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1011) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:998) at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:817) at bulk_test.PhoenixSparkJob.main(PhoenixSparkJob.java:86) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/01/31 22:09:39 INFO spark.SparkContext: Invoking stop() from shutdown hook 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 17/01/31 22:09:39 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 17/01/31 22:09:39 INFO ui.SparkUI: Stopped Spark web UI at http://localhost:4040 17/01/31 22:09:39 INFO scheduler.DAGScheduler: Stopping DAGScheduler 17/01/31 22:09:39 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/01/31 22:09:39 INFO storage.MemoryStore: MemoryStore cleared 17/01/31 22:09:39 INFO storage.BlockManager: BlockManager stopped 17/01/31 22:09:39 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 17/01/31 22:09:39 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/01/31 22:09:39 INFO spark.SparkContext: Successfully stopped SparkContext 17/01/31 22:09:39 INFO util.ShutdownHookManager: Shutdown hook called 17/01/31 22:09:39 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-bbbbb4be-0b55-42d7-ac1d-2243a7cf94cf
PhoenixSparkJob.java
Description: PhoenixSparkJob.java
