This fix is reasonable, since the actual constructor gets called is
|Driver()| rather than |Driver(HiveConf)|. The former initializes the
|conf| field by:
|conf = SessionState.get().getConf()
|
And |SessionState.get()| reads a TSS value. Thus executing SQL queries
within another thread causes NPE since the |Driver| is created in a
thread different from the one |HiveContext| (and the contained
|SessionState|) gets constructed.
On 9/19/14 3:31 AM, Du Li wrote:
I have figured it out.
As shown in the code below, if the HiveContext hc were created in the
actor object and used to create db in response to message, it would
throw null pointer exception. This is fixed by creating the
HiveContext inside the MyActor class instead. I also tested the code
by replacing Actor with Thread. The problem and fix are similar.
Du
——
abstract class MyMessage
case object CreateDB extends MyMessage
object MyActor {
def init(_sc: SparkContext) = {
if( actorSystem == null || actorRef == null ) {
actorSystem = ActorSystem(“root")
actorRef = actorSystem.actorOf(Props(new MyActor(_sc)), “myactor")
}
//hc = new MyHiveContext(_sc)
}
def !(m: MyMessage) {
actorRef ! m
}
//var hc: MyHiveContext = _
private var actorSystem: ActorSystem = null
private var actorRef: ActorRef = null
}
class MyActor(sc: SparkContext) extends Actor {
val hc = new MyHiveContext(sc)
def receive: Receiver = {
case CreateDB => hc.createDB()
}
}
class MyHiveContext(sc: SparkContext) extends HiveContext(sc) {
def createDB() {...}
}
From: "Chester @work" <ches...@alpinenow.com
<mailto:ches...@alpinenow.com>>
Date: Thursday, September 18, 2014 at 7:17 AM
To: Du Li <l...@yahoo-inc.com.INVALID
<mailto:l...@yahoo-inc.com.INVALID>>
Cc: Michael Armbrust <mich...@databricks.com
<mailto:mich...@databricks.com>>, "Cheng, Hao" <hao.ch...@intel.com
<mailto:hao.ch...@intel.com>>, "user@spark.apache.org
<mailto:user@spark.apache.org>" <user@spark.apache.org
<mailto:user@spark.apache.org>>
Subject: Re: problem with HiveContext inside Actor
Akka actor are managed under a thread pool, so the same actor can be
under different thread.
If you create HiveContext in the actor, is it possible that you are
essentially create different instance of HiveContext ?
Sent from my iPhone
On Sep 17, 2014, at 10:14 PM, Du Li <l...@yahoo-inc.com.INVALID
<mailto:l...@yahoo-inc.com.INVALID>> wrote:
Thanks for your reply.
Michael: No. I only create one HiveContext in the code.
Hao: Yes. I subclass HiveContext and defines own function to create
database and then subclass akka Actor to call that function in
response to an abstract message. By your suggestion, I called
println(sessionState.getConf.getAllProperties) that printed
tons of properties; however, the same NullPointerException was still
thrown.
As mentioned, the weird thing is that everything worked fine if I
simply called actor.hiveContext.createDB() directly. But it throws the
null pointer exception from Driver.java if I do "actor !
CreateSomeDB”, which seems to me just the same thing because
the actor does nothing but call createDB().
Du
From: Michael Armbrust <mich...@databricks.com
<mailto:mich...@databricks.com>>
Date: Wednesday, September 17, 2014 at 7:40 PM
To: "Cheng, Hao" <hao.ch...@intel.com <mailto:hao.ch...@intel.com>>
Cc: Du Li <l...@yahoo-inc.com.invalid
<mailto:l...@yahoo-inc.com.invalid>>, "user@spark.apache.org
<mailto:user@spark.apache.org>" <user@spark.apache.org
<mailto:user@spark.apache.org>>
Subject: Re: problem with HiveContext inside Actor
- dev
Is it possible that you are constructing more than one HiveContext in
a single JVM? Due to global state in Hive code this is not allowed.
Michael
On Wed, Sep 17, 2014 at 7:21 PM, Cheng, Hao
<hao.ch...@intel.com <mailto:hao.ch...@intel.com>> wrote:
Hi, Du
I am not sure what you mean “triggers the HiveContext to create a
database”, do you create the sub class
of HiveContext? Just be sure you call the “HiveContext.sessionState”
eagerly, since it will set the proper “hiveconf” into the
SessionState, otherwise the HiveDriver will always get the null value
when retrieving HiveConf.
Cheng Hao
From: Du Li [mailto:l...@yahoo-inc.com.INVALID]
Sent: Thursday, September 18, 2014 7:51 AM
To: user@spark.apache.org <mailto:user@spark.apache.org>;
d...@spark.apache.org <mailto:d...@spark.apache.org>
Subject: problem with HiveContext inside Actor
Hi,
Wonder anybody had similar experience or any suggestion here.
I have an akka Actor that processes database requests in high-level
messages. Inside this Actor, it creates a HiveContext object that does the
actual db work. The main thread creates the needed SparkContext and
passes in to the Actor to create the HiveContext.
When a message is sent to the Actor, it is processed properly except
that, when the message triggers the HiveContext to create a database, it
throws a NullPointerException in hive.ql.Driver.java which suggests
that its conf variable is not initialized.
Ironically, it works fine if my main thread directly calls
actor.hiveContext to create the database. The spark version is 1.1.0.
Thanks,
Du