Hi David-
Can you test the code? It is working for me. Make sure your jar is in HDFS
and you are using the FQDN for referencing it.

import pyhs2

with pyhs2.connect(host='127.0.0.1',
                   port=10000,
                   authMechanism="PLAIN",
                   user='root',
                   password='test',
                   database='default') as conn:
    with conn.cursor() as cur:
cur.execute("ADD JAR hdfs://
sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar")
 cur.execute("CREATE TEMPORARY FUNCTION substr AS
'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
     #Execute query
        cur.execute("select substr(description,2,4) from sample_07")

        #Return column info from query
        print cur.getSchema()

        #Fetch table results
        for i in cur.fetch():
            print i

Thanks,
Brad


On Mon, Apr 28, 2014 at 7:39 AM, David Engel <da...@istwok.net> wrote:

> Thanks for your response.
>
> We've essentially done your first suggestion in the past by copying or
> symlinking our jar into Hive's lib directory.  It works, but we'd like
> a better way for different users to to use different versions of our
> jar during development.  Perhaps that's not possible, though, without
> running completely differnt instances of Hive.
>
> I don't think your second suggestion will work.  The original problem
> is that when "add jar file.jar" is run through pyhs2, the fulle
> command gets passed to AddResourceProcessor.run(), yet
> AddResourceProcessor.run() is written such that it only expects "jar
> file.jar" to get passed to it.  That's how it appears to work when
> "add jar file.jar" is run from a stand-alone Hive CLI and from beeline.
>
> David
>
> On Sat, Apr 26, 2014 at 12:14:53AM -0700, Brad Ruderman wrote:
> > An easy solution would be to add the jar to the classpath or auxlibs
> > therefore every instance of hive already has the jar and you just need to
> > create the temporary function.
> >
> > Else you can put the JAR in HDFS and reference the add jar using the hdfs
> > scheme. Example:
> >
> > import pyhs2
> >
> > with pyhs2.connect(host='127.0.0.1',
> >                    port=10000,
> >                    authMechanism="PLAIN",
> >                    user='root',
> >                    password='test',
> >                    database='default') as conn:
> >     with conn.cursor() as cur:
> > cur.execute("ADD JAR hdfs://
> > sandbox.hortonworks.com:8020/nexr-hive-udf-0.2-SNAPSHOT.jar")
> >  cur.execute("CREATE TEMPORARY FUNCTION substr AS
> > 'com.nexr.platform.hive.udf.UDFSubstrForOracle'")
> >     #Execute query
> >         cur.execute("select substr(description,2,4) from sample_07")
> >
> >         #Return column info from query
> >         print cur.getSchema()
> >
> >         #Fetch table results
> >         for i in cur.fetch():
> >             print i
> >
> >
> > On Fri, Apr 25, 2014 at 7:54 AM, David Engel <da...@istwok.net> wrote:
> >
> > > Hi,
> > >
> > > I'm trying to convert some of our Hive queries to use the pyhs2 Python
> > > package (https://github.com/BradRuderman/pyhs2).  Because we have our
> > > own jar with some custom SerDes and UDFs, we need to use the "add jar
> > > /path/to/my.jar" command to make them available to Hive.  This works
> > > fine using the Hive CLI directly and also with the Beeline client.  It
> > > doesn't work, however, with pyhs2.
> > >
> > > I naively tracked the problem down to a bug in
> > > AddResourceProcessor.run().  See HIVE-6971 in Jira.  My attempted fix
> > > turned out to not be correct because it breaks the "add" command when
> > > used from the CLI and Beeline.  It seems the "add" part of any "add
> > > file|jar|archive ..." command needs to get stripped off somewhere
> > > before it gets passed to AddResourceProcessor.run().  Unfortunately, I
> > > can't find that location when the command is received from pyhs2.  Can
> > > someone help?
> > >
> > > David
> > > --
> > > David Engel
> > > da...@istwok.net
> > >
>
> --
> David Engel
> da...@istwok.net
>

Reply via email to