You really don't want to do that.
It becomes a nightmare in that you now ship a derivative of Hive and then have 
to maintain it and keep it in lock step w Hive from Apache. 


There are other options and designs but since this is for a commercial product. 
I'm not going to talk about them. 

Keep in mind that Hive isn't a relational database per se and works on 
immutable flat files. So that's going to hurt you as well. 

On Oct 17, 2012, at 9:13 PM, lohit <lohit.vijayar...@gmail.com> wrote:

> One idea is to write your own translation layer  which sits in between query 
> and actual job submission. 
> You would most likely end up having your own version of hive jar which has 
> your translation changes on top of HIVE sources.
> This has the added advantage that users need not change their queries, they 
> would do it as normal HIVE query, like
>    select * from cc_details where first_name = 'Ann'
> Disadvantage is you have to maintain a fork. 
> 
> Even otherwise, my initial guess is you might have to modify command line 
> parser which does encrypt once instead of for every record
> 
> 2012/10/17 Sam Mohamed <sam.moha...@voltage.com>
> Thanks for the quick response.
> 
> The idea is that we are selling the encryption product for customers who use 
> HDFS.  Hence, encryption is a requirement.
> 
> Any other suggestions.
> 
> Sam
> ________________________________________
> From: Michael Segel [michael_se...@hotmail.com]
> Sent: Wednesday, October 17, 2012 6:10 PM
> To: user@hadoop.apache.org
> Subject: Re: Hive Query with UDF
> 
> You don't need an UDF...
> 
> You encrypt the string 'Ann' first then use that encrypted value in the 
> Select statement.
> 
> That should make things a bit simpler.
> 
> 
> 
> On Oct 17, 2012, at 8:04 PM, Sam Mohamed <sam.moha...@voltage.com> wrote:
> 
> > I have some encrypted data in an HDFS csv, that I've created a Hive table 
> > for, and I want to run a Hive query that first encrypts the query param, 
> > then does the lookup.  I have a UDF that does encryption as follows:
> >
> > public class ParamEncrypt extends UDF {
> >
> >  public Text evaluate(String name) throws Exception {
> >
> >      String result = new String();
> >
> >      if (name == null) { return null; }
> >
> >      result = ParamData.encrypt(name);
> >
> >      return new Text(result);
> >  }
> > }
> >
> > Then I run the Hive query as:
> >
> >  select * from cc_details where first_name = encrypt('Ann');
> >
> > The problem is, it's running encrypt('Ann') across every single record in 
> > the table.  I want it do the encryption once, then do the matchup.  I've 
> > tried:
> >
> >  select * from cc_details where first_name in (select encrypt('Ann') from 
> > cc_details limit 1);
> >
> > But Hive doesn't support **IN** or select queries in the where clause.
> >
> > What can I do?
> >
> > Can I do something like:
> >
> >  select encrypt('Ann') as ann from cc_details where first_name = ann;
> >
> > That also doesn't work because the query parser throws an error saying 
> > **ann** is not a known column
> >
> > Thanks,
> >
> > Sam
> 
> 
> 
> 
> -- 
> Have a Nice Day!
> Lohit

Reply via email to