Hi David- Can you test the code? It is working for me. Make sure your jar is in HDFS and you are using the FQDN for referencing it.
import pyhs2 with pyhs2.connect(host='127.0.0.1', port=10000, authMechanism="PLAIN", user='root', password='test', database='default') as conn: with conn.cursor() as cur: cur.execute("ADD JAR hdfs:// sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar") cur.execute("CREATE TEMPORARY FUNCTION substr AS 'com.nexr.platform.hive.udf.UDFSubstrForOracle'") #Execute query cur.execute("select substr(description,2,4) from sample_07") #Return column info from query print cur.getSchema() #Fetch table results for i in cur.fetch(): print i Thanks, Brad On Mon, Apr 28, 2014 at 7:39 AM, David Engel <da...@istwok.net> wrote: > Thanks for your response. > > We've essentially done your first suggestion in the past by copying or > symlinking our jar into Hive's lib directory. It works, but we'd like > a better way for different users to to use different versions of our > jar during development. Perhaps that's not possible, though, without > running completely differnt instances of Hive. > > I don't think your second suggestion will work. The original problem > is that when "add jar file.jar" is run through pyhs2, the fulle > command gets passed to AddResourceProcessor.run(), yet > AddResourceProcessor.run() is written such that it only expects "jar > file.jar" to get passed to it. That's how it appears to work when > "add jar file.jar" is run from a stand-alone Hive CLI and from beeline. > > David > > On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote: > > An easy solution would be to add the jar to the classpath or auxlibs > > therefore every instance of hive already has the jar and you just need to > > create the temporary function. > > > > Else you can put the JAR in HDFS and reference the add jar using the hdfs > > scheme. Example: > > > > import pyhs2 > > > > with pyhs2.connect(host='127.0.0.1', > > port=10000, > > authMechanism="PLAIN", > > user='root', > > password='test', > > database='default') as conn: > > with conn.cursor() as cur: > > cur.execute("ADD JAR hdfs:// > > sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar") > > cur.execute("CREATE TEMPORARY FUNCTION substr AS > > 'com.nexr.platform.hive.udf.UDFSubstrForOracle'") > > #Execute query > > cur.execute("select substr(description,2,4) from sample_07") > > > > #Return column info from query > > print cur.getSchema() > > > > #Fetch table results > > for i in cur.fetch(): > > print i > > > > > > On Fri, Apr 25, 2014 at 7:54 AM, David Engel <da...@istwok.net> wrote: > > > > > Hi, > > > > > > I'm trying to convert some of our Hive queries to use the pyhs2 Python > > > package (https://github.com/BradRuderman/pyhs2). Because we have our > > > own jar with some custom SerDes and UDFs, we need to use the "add jar > > > /path/to/my.jar" command to make them available to Hive. This works > > > fine using the Hive CLI directly and also with the Beeline client. It > > > doesn't work, however, with pyhs2. > > > > > > I naively tracked the problem down to a bug in > > > AddResourceProcessor.run(). See HIVE-6971 in Jira. My attempted fix > > > turned out to not be correct because it breaks the "add" command when > > > used from the CLI and Beeline. It seems the "add" part of any "add > > > file|jar|archive ..." command needs to get stripped off somewhere > > > before it gets passed to AddResourceProcessor.run(). Unfortunately, I > > > can't find that location when the command is received from pyhs2. Can > > > someone help? > > > > > > David > > > -- > > > David Engel > > > da...@istwok.net > > > > > -- > David Engel > da...@istwok.net >