Hi Jamal, date is standard linux/unix tool, see the manual page: http://linux.die.net/man/1/date.
The $(...) tells the shell to execute the command and insert it's output into the string. So in this case it will execute command date -d -1day +%Y%m%d which returns yesterday date in the format you need. Jan On Tue, Aug 7, 2012 at 8:24 PM, Raihan Jamal <[email protected]> wrote: > Yes it supports -e option, but in your query what is date? > > hive -e "CREATE TEMPORARY FUNCTION yesterdaydate > AS 'com.example.hive.udf.YesterdayDate'; > SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;" > > > > *Raihan Jamal* > > > > On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <[email protected]> wrote: > >> By the way, even without hiveconf, you can run hive from shell like this >> to achieve what you want using shell capabilities: >> >> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate >> AS 'com.example.hive.udf.YesterdayDate'; >> SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;" >> >> At least if hive 6.0 supports -e option, I don't have where to check that. >> >> Jan >> >> >> On Tue, Aug 7, 2012 at 8:11 PM, Vijay <[email protected]> wrote: >> >>> Given the implementation of the UDF, I don't think hive would be able >>> to use partition pruning. Especially the version you're using. I'd >>> really recommend upgrading to a later version that has the hiveconf >>> support. That can save a lot of trouble rather than trying to get >>> things working on 0.6 >>> >>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <[email protected]> >>> wrote: >>> > Hi Jan, >>> > >>> > I have date in different format also, so that is the reason I was >>> thinking >>> > to do by this approach. How can I make sure this will work on the >>> selected >>> > partition only and it will not scan the entire table. I will add your >>> > suggestion in my UDF as deterministic thing. >>> > >>> > My simple question here is- How to get the Yesterdays date which I can >>> use >>> > on the Date Partition I cannot use hiveconf here as I am working with >>> Hive >>> > 0.6 >>> > >>> > >>> > >>> > >>> > Raihan Jamal >>> > >>> > >>> > >>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <[email protected]> >>> wrote: >>> >> >>> >> I'm afraid that he query >>> >> >>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10; >>> >> >>> >> will scan entire table, because the functions is evaluated at >>> runtime, so >>> >> Hive doesn't know what the value is when it decides which files to >>> scan. I >>> >> am not 100% sure though, you should try it. >>> >> >>> >> Also, you might want to try to add annotation to your UDF saying that >>> the >>> >> function is deterministic: >>> >> @UDFType(deterministic=false) >>> >> >>> >> I think Hive might be clever enough to evaluate it early enough to >>> use the >>> >> partition pruning correctly, since it operates on constant >>> expression. But >>> >> again, I'm not really sure, maybe someone with deeper knowledge of >>> Hive >>> >> optimizations will tell us more. It is actually quite interesting >>> question. >>> >> >>> >> Another way to help Hive with the optimizations might be to skip >>> passing >>> >> the format string argument, if you have all dates in same format, you >>> can >>> >> call the function just like 'yesterdaydate()' and hardcode the format >>> in the >>> >> function. >>> >> >>> >> Jan >>> >> >>> >> >>> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <[email protected]> >>> >> wrote: >>> >>> >>> >>> Hi Jan, >>> >>> >>> >>> >>> >>> >>> >>> I figured that out, it is working fine for me now. The only question >>> I >>> >>> have is, if I am doing like this- >>> >>> >>> >>> >>> >>> >>> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10; >>> >>> >>> >>> >>> >>> >>> >>> Then the above query will be evaluated as below right? >>> >>> >>> >>> >>> >>> >>> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10; >>> >>> >>> >>> >>> >>> >>> >>> So that means it will look for data in the corresponding dt partition >>> >>> (20120806) only right as above table is partitioned on dt column ? >>> And it >>> >>> will not scan the whole table right? >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> Raihan Jamal >>> >>> >>> >>> >>> >>> >>> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <[email protected]> >>> wrote: >>> >>>> >>> >>>> Hi Jamal, >>> >>>> >>> >>>> Check if the function really returns what it should and that your >>> data >>> >>>> are really in yyyyMMdd format. You can do this by simple query like >>> this: >>> >>>> >>> >>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1; >>> >>>> >>> >>>> I don't see anything wrong with the function itself, it works well >>> for >>> >>>> me (although I tested it in hive 0.7.1). The only thing I would >>> change about >>> >>>> it would be to optimize it by calling 'new' only at the time of >>> construction >>> >>>> and reusing the object when the function is called, but that should >>> not >>> >>>> affect the functionality at all. >>> >>>> >>> >>>> Best regards, >>> >>>> Jan >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <[email protected] >>> > >>> >>>> wrote: >>> >>>>> >>> >>>>> Problem >>> >>>>> >>> >>>>> I created the below UserDefinedFunction to get the yesterday's day >>> in >>> >>>>> the format I wanted as I will be passing the format into this >>> below method >>> >>>>> from the query. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> public final class YesterdayDate extends UDF { >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> public String evaluate(final String format) { >>> >>>>> >>> >>>>> DateFormat dateFormat = new >>> >>>>> SimpleDateFormat(format); >>> >>>>> >>> >>>>> Calendar cal = >>> Calendar.getInstance(); >>> >>>>> >>> >>>>> cal.add(Calendar.DATE, -1); >>> >>>>> >>> >>>>> return >>> >>>>> dateFormat.format(cal.getTime()).toString(); >>> >>>>> >>> >>>>> } >>> >>>>> >>> >>>>> } >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> So whenever I try to run the query like below by adding the jar to >>> >>>>> classpath and creating the temporary function yesterdaydate, I >>> always get >>> >>>>> zero result back- >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> hive> create temporary function yesterdaydate as >>> >>>>> 'com.example.hive.udf.YesterdayDate'; >>> >>>>> >>> >>>>> OK >>> >>>>> >>> >>>>> Time taken: 0.512 seconds >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> Below is the query I am running- >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') >>> LIMIT >>> >>>>> 10; >>> >>>>> >>> >>>>> OK >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> And I always get zero result back but the data is there in that >>> table >>> >>>>> for Aug 5th. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> What wrong I am doing? Any suggestions will be appreciated. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable >>> >>>>> substitution thing, so I cannot use hiveconf here and the above >>> table has >>> >>>>> been partitioned on dt(date) column. >>> >>>> >>> >>>> >>> >>> >>> >> >>> > >>> >> >> >
