Yes it supports -e option, but in your query what is date? hive -e "CREATE TEMPORARY FUNCTION yesterdaydate AS 'com.example.hive.udf.YesterdayDate'; SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;"
*Raihan Jamal* On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <dolik....@gmail.com> wrote: > By the way, even without hiveconf, you can run hive from shell like this > to achieve what you want using shell capabilities: > > hive -e "CREATE TEMPORARY FUNCTION yesterdaydate > AS 'com.example.hive.udf.YesterdayDate'; > SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;" > > At least if hive 6.0 supports -e option, I don't have where to check that. > > Jan > > > On Tue, Aug 7, 2012 at 8:11 PM, Vijay <tec...@gmail.com> wrote: > >> Given the implementation of the UDF, I don't think hive would be able >> to use partition pruning. Especially the version you're using. I'd >> really recommend upgrading to a later version that has the hiveconf >> support. That can save a lot of trouble rather than trying to get >> things working on 0.6 >> >> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <jamalrai...@gmail.com> >> wrote: >> > Hi Jan, >> > >> > I have date in different format also, so that is the reason I was >> thinking >> > to do by this approach. How can I make sure this will work on the >> selected >> > partition only and it will not scan the entire table. I will add your >> > suggestion in my UDF as deterministic thing. >> > >> > My simple question here is- How to get the Yesterdays date which I can >> use >> > on the Date Partition I cannot use hiveconf here as I am working with >> Hive >> > 0.6 >> > >> > >> > >> > >> > Raihan Jamal >> > >> > >> > >> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <dolik....@gmail.com> >> wrote: >> >> >> >> I'm afraid that he query >> >> >> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10; >> >> >> >> will scan entire table, because the functions is evaluated at runtime, >> so >> >> Hive doesn't know what the value is when it decides which files to >> scan. I >> >> am not 100% sure though, you should try it. >> >> >> >> Also, you might want to try to add annotation to your UDF saying that >> the >> >> function is deterministic: >> >> @UDFType(deterministic=false) >> >> >> >> I think Hive might be clever enough to evaluate it early enough to use >> the >> >> partition pruning correctly, since it operates on constant expression. >> But >> >> again, I'm not really sure, maybe someone with deeper knowledge of Hive >> >> optimizations will tell us more. It is actually quite interesting >> question. >> >> >> >> Another way to help Hive with the optimizations might be to skip >> passing >> >> the format string argument, if you have all dates in same format, you >> can >> >> call the function just like 'yesterdaydate()' and hardcode the format >> in the >> >> function. >> >> >> >> Jan >> >> >> >> >> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <jamalrai...@gmail.com> >> >> wrote: >> >>> >> >>> Hi Jan, >> >>> >> >>> >> >>> >> >>> I figured that out, it is working fine for me now. The only question I >> >>> have is, if I am doing like this- >> >>> >> >>> >> >>> >> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10; >> >>> >> >>> >> >>> >> >>> Then the above query will be evaluated as below right? >> >>> >> >>> >> >>> >> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10; >> >>> >> >>> >> >>> >> >>> So that means it will look for data in the corresponding dt partition >> >>> (20120806) only right as above table is partitioned on dt column ? >> And it >> >>> will not scan the whole table right? >> >>> >> >>> >> >>> >> >>> >> >>> Raihan Jamal >> >>> >> >>> >> >>> >> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <dolik....@gmail.com> >> wrote: >> >>>> >> >>>> Hi Jamal, >> >>>> >> >>>> Check if the function really returns what it should and that your >> data >> >>>> are really in yyyyMMdd format. You can do this by simple query like >> this: >> >>>> >> >>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1; >> >>>> >> >>>> I don't see anything wrong with the function itself, it works well >> for >> >>>> me (although I tested it in hive 0.7.1). The only thing I would >> change about >> >>>> it would be to optimize it by calling 'new' only at the time of >> construction >> >>>> and reusing the object when the function is called, but that should >> not >> >>>> affect the functionality at all. >> >>>> >> >>>> Best regards, >> >>>> Jan >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <jamalrai...@gmail.com> >> >>>> wrote: >> >>>>> >> >>>>> Problem >> >>>>> >> >>>>> I created the below UserDefinedFunction to get the yesterday's day >> in >> >>>>> the format I wanted as I will be passing the format into this below >> method >> >>>>> from the query. >> >>>>> >> >>>>> >> >>>>> >> >>>>> public final class YesterdayDate extends UDF { >> >>>>> >> >>>>> >> >>>>> >> >>>>> public String evaluate(final String format) { >> >>>>> >> >>>>> DateFormat dateFormat = new >> >>>>> SimpleDateFormat(format); >> >>>>> >> >>>>> Calendar cal = >> Calendar.getInstance(); >> >>>>> >> >>>>> cal.add(Calendar.DATE, -1); >> >>>>> >> >>>>> return >> >>>>> dateFormat.format(cal.getTime()).toString(); >> >>>>> >> >>>>> } >> >>>>> >> >>>>> } >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> So whenever I try to run the query like below by adding the jar to >> >>>>> classpath and creating the temporary function yesterdaydate, I >> always get >> >>>>> zero result back- >> >>>>> >> >>>>> >> >>>>> >> >>>>> hive> create temporary function yesterdaydate as >> >>>>> 'com.example.hive.udf.YesterdayDate'; >> >>>>> >> >>>>> OK >> >>>>> >> >>>>> Time taken: 0.512 seconds >> >>>>> >> >>>>> >> >>>>> >> >>>>> Below is the query I am running- >> >>>>> >> >>>>> >> >>>>> >> >>>>> hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') >> LIMIT >> >>>>> 10; >> >>>>> >> >>>>> OK >> >>>>> >> >>>>> >> >>>>> >> >>>>> And I always get zero result back but the data is there in that >> table >> >>>>> for Aug 5th. >> >>>>> >> >>>>> >> >>>>> >> >>>>> What wrong I am doing? Any suggestions will be appreciated. >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable >> >>>>> substitution thing, so I cannot use hiveconf here and the above >> table has >> >>>>> been partitioned on dt(date) column. >> >>>> >> >>>> >> >>> >> >> >> > >> > >