f the query is executed around
midnight.
Thanks,Venkatesh
Date: Wed, 8 Aug 2012 03:49:56 -0700
From: bejoy...@yahoo.com
Subject: Re: Custom UserDefinedFunction in Hive
To: user@hive.apache.org
Hi Raihan
UDFs are evaluated at run time when the query is executed. But it is hive
parser during query p
.
Regards,
Bejoy KS
From: Raihan Jamal
To: user@hive.apache.org
Cc: d...@hive.apache.org
Sent: Tuesday, August 7, 2012 10:50 PM
Subject: Re: Custom UserDefinedFunction in Hive
Hi Jan,
I figured that out, it is working fine for me now. The only question I have
Right, no need for the function at all. Sorry it is getting late here and
my brain refuses to work any more :)
On Tue, Aug 7, 2012 at 8:39 PM, Techy Teck wrote:
> Then that means I don't need to create that userdefinedfunction right?
>
>
>
> On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár wrote:
Let me try that and I will update on this thread.
*Raihan Jamal*
On Tue, Aug 7, 2012 at 11:39 AM, Techy Teck wrote:
> Then that means I don't need to create that userdefinedfunction right?
>
>
>
> On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár wrote:
>
>> Hi Jamal,
>>
>> date is standard lin
Then that means I don't need to create that userdefinedfunction right?
On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár wrote:
> Hi Jamal,
>
> date is standard linux/unix tool, see the manual page:
> http://linux.die.net/man/1/date.
>
> The $(...) tells the shell to execute the command and insert
Hi Jamal,
date is standard linux/unix tool, see the manual page:
http://linux.die.net/man/1/date.
The $(...) tells the shell to execute the command and insert it's output
into the string. So in this case it will execute command
date -d -1day +%Y%m%d
which returns yesterday date in the format you
Hi Vijay,
Thanks for the suggestion, If upgrading to Hive was under my control then
I would have done for sure, but I am working in a company and they are
running Hive 0.6 on all the cluster, And I told them to upgrade the Hive
version but they said it will take few months for them to do this. An
You actually don't need hive on the whole cluster. That's the beauty
of it. You only need it on the client machine where you're submitting
hive jobs. Of course the metadata store does need to be upgraded for
newer versions so that might still be a problem.
On Tue, Aug 7, 2012 at 11:26 AM, Raihan J
Yes it supports -e option, but in your query what is date?
hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
AS 'com.example.hive.udf.YesterdayDate';
SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;"
*Raihan Jamal*
On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár wrote:
> By
By the way, even without hiveconf, you can run hive from shell like this to
achieve what you want using shell capabilities:
hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
AS 'com.example.hive.udf.YesterdayDate';
SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;"
At least if hiv
Given the implementation of the UDF, I don't think hive would be able
to use partition pruning. Especially the version you're using. I'd
really recommend upgrading to a later version that has the hiveconf
support. That can save a lot of trouble rather than trying to get
things working on 0.6
On Tu
Hi Jan,
I have date in different format also, so that is the reason I was thinking
to do by this approach. How can I make sure this will work on the selected
partition only and it will not scan the entire table. I will add your
suggestion in my UDF as deterministic thing.
My simple question here
@kulkarni,
When I did explain on my query, I got these things, I am not sure how to
understand these thing. Any help will be appreciated whether my approach is
right or not?-
hive> EXPLAIN SELECT * FROM PDS_ATTRIBUTE_DATA_REALTIME where
dt=yesterdaydate('MMdd', 2) LIMIT 5;
OK
ABSTRACT S
Oops, sorry I made a copy&paste mistake :) The annotation should read
@*UDFType(deterministic=true*)
Jan
On Tue, Aug 7, 2012 at 7:37 PM, Jan Dolinár wrote:
> I'm afraid that he query
>
> SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10;
>
> will scan entire table, because th
I'm afraid that he query
SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10;
will scan entire table, because the functions is evaluated at runtime, so
Hive doesn't know what the value is when it decides which files to scan. I
am not 100% sure though, you should try it.
Also, yo
Have you tried using EXPLAIN[1] on your query? I usually like to use that
to get a better understanding of what my query is actually doing and
debugging at other times.
[1] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain
On Tue, Aug 7, 2012 at 12:20 PM, Raihan Jamal wrote
Hi Jan,
I figured that out, it is working fine for me now. The only question I have
is, if I am doing like this-
SELECT * FROM REALTIME where dt= yesterdaydate('MMdd') LIMIT 10;
Then the above query will be evaluated as below right?
SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 1
I tested that function using main and by printing it out and it works fine.
As I am trying to get the Yesterday's date.
I need my query to be like this as today's date is Aug 6th, so query should
be for Aug 5th. And this works fine for me.
*SELECT * FROM REALTIME where dt= '20120805' LIMIT 10;*
Hi Jamal,
Check if the function really returns what it should and that your data are
really in MMdd format. You can do this by simple query like this:
SELECT dt, yesterdaydate('MMdd') FROM REALTIME LIMIT 1;
I don't see anything wrong with the function itself, it works well for me
(althou
19 matches
Mail list logo