Yeah I just went to reproduce this on a fresh environment, I blew away all the 
Zookeeper data and the error went away.  I'm running HA JobManager (1 active, 2 
standby) and 3 TMs.  I'm not sure how to fully account for this behavior yet, 
it looks like I can make this run from a totally fresh environment, so until I 
start practicing how to save point + restore a 1.4.2 -> 1.5.0 job, I look to be 
good for the moment.  Sorry to have bothered you all.


Thanks.

________________________________
From: Dawid Wysakowicz <dwysakow...@apache.org>
Sent: Friday, July 20, 2018 3:09:46 AM
To: Philip Doctor
Cc: user; Kostas Kloudas
Subject: Re: Flink 1.4.2 -> 1.5.0 upgrade Queryable State debugging

Hi Philip,
Could you attach the full stack trace? Are you querying the same job/cluster in 
both tests?
I am also looping in Kostas, who might know more about changes in Queryable 
state between 1.4.2 and 1.5.0.
Best,
Dawid

On Thu, 19 Jul 2018 at 22:33, Philip Doctor 
<philip.doc...@physiq.com<mailto:philip.doc...@physiq.com>> wrote:
Dear Flink Users,
I'm trying to upgrade to flink 1.5.0, so far everything works except for the 
Queryable state client.  Now here's where it gets weird.  I have the client 
sitting behind a web API so the rest of our non-java ecosystem can consume it.  
I've got 2 tests, one calls my route directly as a java method call, the other 
calls the deployed server via HTTP (the difference in the test intending to 
flex if the server is properly started, etc).

The local call succeeds, the remote call fails, but the failure isn't what I'd 
expect (around the server layer) it's compalining about contacting the oracle 
for state location:

<very very long stack trace>
 Caused by: 
org.apache.flink.queryablestate.exceptions.UnknownLocationException: Could not 
contact the state location oracle to retrieve the state location.\n\tat 
org.apache.flink.queryablestate.client.proxy.KvStateClientProxyHandler.getKvStateLookupInfo(KvStateClientProxyHandler.java:228)\n
<etc>

The client call is pretty straightforward:

        return client.getKvState(
            flinkJobID,
            stateDescriptor.queryableStateName,
            key,
            keyTypeInformation,
            stateDescriptor
        )

I've confirmed via logs that I have the exact same key, flink job ID, and 
queryable state name.

So I'm going bonkers on what difference might exist, I'm wondering if I'm 
packing my jar wrong and there's some resources I need to look out for? (back 
on flink 1.3.x I had to handle the reference.conf file for AKKA when you were 
depending on that for Queryable State, is there something like that? etc).  Is 
there /more logging/ somewhere on the server side that might give me a hint?  
like "Tried to query state for X but couldn't find the BananaLever" ?  I'm 
pretty stuck right now and ready to try any random ideas to move forward.


The only changes I've made aside from the jar version bump from 1.4.2 to 1.5.0 
is handling queryableStateName (which is now nullable), no other code changes 
were required, and to confirm, this all works just fine with 1.4.2.


Thank you.

Reply via email to