Date Comparisons. in Hive

2012-10-03 Thread Raihan Jamal
;2012/09/18 00:00:00'* * AND SojTimestampToDate(event.event_timestamp) <= '2012/09/18 02:00:00'* Can anyone shed some light on this whether I am doing right or not? *Raihan Jamal*

Re: org.apache.hadoop.ipc.RemoteException(java.io.IOException: java.io.IOException

2012-10-03 Thread Raihan Jamal
Just to add here *SojTimestampToDate* will return data in this format only *2012/02/29 17:01:43* *Raihan Jamal* On Wed, Oct 3, 2012 at 4:46 PM, Raihan Jamal wrote: > This is still not working as in the XML file the *final* property has > been set as true so that means I cannot overr

Re: org.apache.hadoop.ipc.RemoteException(java.io.IOException: java.io.IOException

2012-10-03 Thread Raihan Jamal
s for this job 2070929 exceeds the configured limit 20* * * Any other suggestion what should I do to overcome this problem? May be any changes in the query can overcome this problem? *Raihan Jamal* On Wed, Oct 3, 2012 at 2:59 PM, Chalcy Raja wrote: > Hi Raihan, > > ** **

Re: org.apache.hadoop.ipc.RemoteException(java.io.IOException: java.io.IOException

2012-10-03 Thread Raihan Jamal
What about if I do like below? Will this work? * set mapred.jobtracker.maxtasks.per.job=-1* *Raihan Jamal* On Wed, Oct 3, 2012 at 2:59 PM, Chalcy Raja wrote: > Hi Raihan, > > ** ** > > You can set it in hive prompt like below, > > set mapred.jobtracker.max

Re: org.apache.hadoop.ipc.RemoteException(java.io.IOException: java.io.IOException

2012-10-03 Thread Raihan Jamal
changes manually from the Hive prompt? Any suggestions? *Raihan Jamal* On Wed, Oct 3, 2012 at 2:19 PM, Raihan Jamal wrote: > Can anyone help me out here? What does the below error means? And this is > the query I am using- > > *SELECT cguid,* > * event_item,* >

Re: org.apache.hadoop.ipc.RemoteException(java.io.IOException: java.io.IOException

2012-10-03 Thread Raihan Jamal
;) >= unix_timestamp('2012/09/18 00:00:00', '/MM/dd HH:mm:ss')* * AND unix_timestamp(SojTimestampToDate(event.event_timestamp), 'yyyy/MM/dd HH:mm:ss') <= unix_timestamp('2012/09/18 02:00:00', '/MM/dd HH:mm:ss')* * ) n ON m.cguid = n.ch

org.apache.hadoop.ipc.RemoteException(java.io.IOException: java.io.IOException

2012-10-03 Thread Raihan Jamal
error means? Can anyone help me out here? *Raihan Jamal*

Re: Unexpected end of input stream

2012-08-28 Thread Raihan Jamal
That basically means your data was not in the correct format when you move or copied the data to HDFS. So there is one file which is corrupted, you can find the file name in your error logs. *Raihan Jamal* On Tue, Aug 28, 2012 at 7:23 AM, Kiwon Lee wrote: > Hi > > I have

Re: Exit Status for Success and Failure in HiveQL queries

2012-08-15 Thread Raihan Jamal
Let me try that.. Thanks for the help. *Raihan Jamal* On Wed, Aug 15, 2012 at 5:10 PM, hadoop hive wrote: > Hey Jamal, >You can use bash shell script combined with hive query, in shell script > you can check for exit status. > E.g : > #!/bin/bash > hive -e "

Exit Status for Success and Failure in HiveQL queries

2012-08-15 Thread Raihan Jamal
other HiveQL queries. *Raihan Jamal*

Re: how to do random sampling in hive?

2012-08-14 Thread Raihan Jamal
I think you can use here LIMIT- Limit indicates the number of rows to be returned. The rows returned are chosen at random. The following query returns 5 rows from t1 at random. SELECT * FROM t1 LIMIT 5 http://karmasphere.com/hive-queries-on-table-data *Raihan Jamal* On Tue, Aug 14, 2012

count(*) vs count(1) in hive

2012-08-14 Thread Raihan Jamal
Is there any difference between count(*) and count(1) in Hive. And which one should we use in general and why? Given that I am on Hive 0.6 version. *Raihan Jamal*

Re: Running the HiveQL from the shell prompt.

2012-08-07 Thread Raihan Jamal
Thanks Jan for the suggestion. *Raihan Jamal* On Tue, Aug 7, 2012 at 10:01 PM, Jan Dolinár wrote: > The shell will interpret the query in your command as SELECT > ... explode(split(timestamps, *#*)) ... if you run it the way you wrote > it, i.e. without the quotation. The way ar

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
Let me try that and I will update on this thread. *Raihan Jamal* On Tue, Aug 7, 2012 at 11:39 AM, Techy Teck wrote: > Then that means I don't need to create that userdefinedfunction right? > > > > On Tue, Aug 7, 2012 at 11:32 AM, Jan Dolinár wrote: > >> Hi

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
. And I don't know why they are saying like this, so that is the reason I was doing like this. Any suggestions will be appreciated to make this thing work *Raihan Jamal* On Tue, Aug 7, 2012 at 11:11 AM, Vijay wrote: > Given the implementation of the UDF, I don't think hive would

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
Yes it supports -e option, but in your query what is date? hive -e "CREATE TEMPORARY FUNCTION yesterdaydate AS 'com.example.hive.udf.YesterdayDate'; SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;" *Raihan Jamal* On Tue, Aug 7, 2012 at 11:18 AM

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
is- How to get the Yesterdays date which I can use on the Date Partition I cannot use hiveconf here as I am working with Hive 0.6 *Raihan Jamal* On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár wrote: > I'm afraid that he query > > SELECT * FROM REALTIME where dt= yesterday

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: 5 Time taken: 12.126 seconds *Raihan Jamal* On Tue, Aug 7, 2012 at 10:56 AM, Jan Dolinár wrote: > Oops, sorry I made a copy&paste mistake :) The annotation should read

Re: Custom UserDefinedFunction in Hive

2012-08-07 Thread Raihan Jamal
806’ LIMIT 10; So that means it will look for data in the corresponding dt partition *(20120806) *only right as above table is partitioned on dt column ? And it will not scan the whole table right?** *Raihan Jamal* On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár wrote: > Hi Jamal, >

Re: Custom UserDefinedFunction in Hive

2012-08-06 Thread Raihan Jamal
ng is wrong the way I am doing it for sure? *Raihan Jamal* On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár wrote: > Hi Jamal, > > Check if the function really returns what it should and that your data are > really in MMdd format. You can do this by simple query like this: >

Re: Error while reading from task log url

2012-07-20 Thread Raihan Jamal
Yup, Thanks it worked. *Raihan Jamal* On Fri, Jul 20, 2012 at 1:40 PM, Bejoy KS wrote: > ** > Raihan > > To see the failed task logs in hadoop, the easiest approach is > drilling down the jobtracker web UI. > > Go to the job url (which you'll get in the beginning

Re: Error while reading from task log url

2012-07-20 Thread Raihan Jamal
I tried opening the below URL, and nothing got opened, I got page cannot be displayed. Why is that so? *Raihan Jamal* On Fri, Jul 20, 2012 at 12:39 PM, Sriram Krishnan wrote: > What version of Hadoop and Hive are you using? We have seen errors like > this in the past – and you can ac

Re: Error while reading from task log url

2012-07-20 Thread Raihan Jamal
After setting this in Hive- hive> SET hive.exec.show.job.failure.debug.info=false; I can see the logs on my console itself? Or I need to go somewhere to see the actual logs and what is causing the problem? *Raihan Jamal* On Fri, Jul 20, 2012 at 12:28 PM, kulkarni.swar...@gmail.

Error while reading from task log url

2012-07-20 Thread Raihan Jamal
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120) * *... 15 more* *Ended Job = job_201207172005_14407 with exception 'java.lang.RuntimeException(Error while reading from task log url)'* *Raihan Jamal*

Re: Difference between timestamp is 15 minutes

2012-07-18 Thread Raihan Jamal
Something like this will work in Hive? *ON ((UNIX_TIMESTAMP(testingtable1.created_time) - (prod_and_ts.timestamps / 1000)) / 60* 1000 <= 15 minutes)* *Raihan Jamal* On Wed, Jul 18, 2012 at 4:48 PM, Raihan Jamal wrote: > This is the CREATED_TIME *`2012-07-17 00:00:22`* and this

Difference between timestamp is 15 minutes

2012-07-18 Thread Raihan Jamal
(UNIX_TIMESTAMP(testingtable1.created_time) - (prod_and_ts.timestamps / 1000) = 15 minutes) How I can do the above case if difference between timestamps is within 15 minutes then data will get matched by the above `ON clause` *Raihan Jamal*

Re: Get only the date from DateType data

2012-07-18 Thread Raihan Jamal
And CREATED_TIME is string data type. *Raihan Jamal* On Wed, Jul 18, 2012 at 2:48 PM, Raihan Jamal wrote: > This is the CREATED_TIME *2009-12-14 10:15:54* > * > * > How I can get only the date part from the above created_time, just like > below. > > *2009-12-14* >

Get only the date from DateType data

2012-07-18 Thread Raihan Jamal
This is the CREATED_TIME *2009-12-14 10:15:54* * * How I can get only the date part from the above created_time, just like below. *2009-12-14* Any suggestions will be appreciated. *Raihan Jamal*

Re: Converting Timestamp to Date omly

2012-07-18 Thread Raihan Jamal
*to_date(from_unixtime(cast(prod_and_ts.timestamps /1000 as BIGINT)))* * * So this should work? I am currently running to see the output. *Raihan Jamal* On Wed, Jul 18, 2012 at 1:16 PM, Raihan Jamal wrote: > Can you show me exact syntax, how to do this? It will be of great help to &

Re: Converting Timestamp to Date omly

2012-07-18 Thread Raihan Jamal
Can you show me exact syntax, how to do this? It will be of great help to me. Thanks. *Raihan Jamal* On Wed, Jul 18, 2012 at 1:14 PM, Paul Mackles wrote: > That timestamp is in millseconds but the hive date functions expect > seconds. Try dividing by 1000 first. > > From:

Re: Run simple HiveQL query using shell script?

2012-07-17 Thread Raihan Jamal
in an email. *Raihan Jamal* On Tue, Jul 17, 2012 at 11:30 PM, Vinod Singh wrote: > hive -e "SELECT count(*) from pds_table" > a.txt > > Thanks, > Vinod > > > On Wed, Jul 18, 2012 at 10:58 AM, Raihan Jamal wrote: > >> I am new to Unix Shel

Run simple HiveQL query using shell script?

2012-07-17 Thread Raihan Jamal
> a.txt; How can I do this from a shell script and send the output to a txt file and then send that txt file as an attachment in an email. *Raihan Jamal*

Anything wrong with this query?

2012-07-13 Thread Raihan Jamal
unix_timestamp(tt1.created_time) = tt2.timestamps) Any suggestions will be appreciated. *Raihan Jamal*

Re: Custom Mapper and Reducer vs HiveQL in terms of Performance

2012-07-12 Thread Raihan Jamal
Sending it again. As I haven't got any reply on this. Any personal experience will be appreciated. *Raihan Jamal* On Mon, Jul 9, 2012 at 3:37 PM, Raihan Jamal wrote: > *Problem Statement:-* > > I need to compare two tables Table1 and Table2 and they both store same > th

Re: Output from HiveQL query

2012-07-12 Thread Raihan Jamal
That makes sense to me. So that means whenever I do any HiveQL query, only the output's are displayed on the console, they are not stored anywhere. And if we want to store, then as Kulkarni suggested we need to do that. Right? *Raihan Jamal* On Thu, Jul 12, 2012 at 12:07 PM, kulkarni

Re: Output from HiveQL query

2012-07-12 Thread Raihan Jamal
? *Raihan Jamal* On Thu, Jul 12, 2012 at 11:56 AM, Roberto Sanabria wrote: > Or you can output to a table and store it there. > > > On Thu, Jul 12, 2012 at 2:53 PM, VanHuy Pham wrote: > >> The output can be printed out on terminal when you run it, or can be >> stored if yo

Output from HiveQL query

2012-07-12 Thread Raihan Jamal
any time limit on that meaning after this much particular time it will be deleted?** *Raihan Jamal*

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Raihan Jamal
Yup this works. Thanks for the help. *Raihan Jamal* On Tue, Jul 10, 2012 at 4:37 PM, Vijay wrote: > In that case, wouldn't this work: > > SELECT buyer_id, item_id, rank(buyer_id), created_time > FROM ( > SELECT buyer_id, item_id, created_time > FROM testing

Anything wrong with this query?

2012-07-10 Thread Raihan Jamal
stamps); I always get error as- *FAILED: Error in semantic analysis: line 13:6 Invalid Table Alias or Column Reference prod_and_ts* *Raihan Jamal*

Re: What's wrong with this query?

2012-07-10 Thread Raihan Jamal
Thanks Vijay, Yes it worked. Can you also take a look into one of my other post subject title *TOP 10.* *Raihan Jamal* On Tue, Jul 10, 2012 at 1:41 PM, Vijay wrote: > to_date(from_unixtime(cast(timestamps as int))) > > On Tue, Jul 10, 2012 at 1:33 PM, Raihan Jamal > wrote: >

Re: What's wrong with this query?

2012-07-10 Thread Raihan Jamal
I need only the date not the hours and second, so that is the reason I was using to_date and from_unxitime() take int as parameter and timestamps is a string in this case. *Raihan Jamal* On Tue, Jul 10, 2012 at 1:28 PM, Vijay wrote: > You need to use from_unixtime() > > On Tu

What's wrong with this query?

2012-07-10 Thread Raihan Jamal
testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) A; This is the Output I am getting always. *1004941621 NULL* *1005268799 NULL* *1061569397 NULL* *1005542471 NULL* *Raihan Jamal*

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Raihan Jamal
tingtable1 DISTRIBUTE BY buyer_id, item_id SORT BY buyer_id, item_id, created_time desc ) a WHERE rk < 10 ORDER BY buyer_id, created_time, rk; *Raihan Jamal* On Tue, Jul 10, 2012 at 12:16 AM, Jasper Knulst wrote: > Hi Raihan, > > You should use 'rank(buyer_id)' in

Re: Find TOP 10 using HiveQL

2012-07-10 Thread Raihan Jamal
I am trying that solution. Currently I am running my query to see what result I am getting back with UDF. *Raihan Jamal* On Tue, Jul 10, 2012 at 12:13 AM, Nitin Pawar wrote: > i thought you managed to solve this with rank?? > > > On Tue, Jul 10, 2012 at 12:38 PM, Raihan

Re: Find TOP 10 using HiveQL

2012-07-10 Thread Raihan Jamal
Problem with that approach is, with LIMIT 10, If I am putting after desc, then it will get only 10 rows irrespective of BUYER_ID. But I need specifically for each BUYER_ID 10 latest rows. *Raihan Jamal* On Tue, Jul 10, 2012 at 12:03 AM, Abhishek Tiwari < abhishektiwari.bt...@gmail.

Re: Invalid Function rank in HiveQL

2012-07-10 Thread Raihan Jamal
desc ) a WHERE rank < 10 ORDER BY buyer_id, created_time, rank; What changes I need to make? *Raihan Jamal* On Mon, Jul 9, 2012 at 11:52 PM, Nitin Pawar wrote: > try rk in upper select statement as well > > > On Tue, Jul 10, 2012 at 12:12 PM, Raihan Jamal wrote: > >&g

Re: Invalid Function rank in HiveQL

2012-07-09 Thread Raihan Jamal
ons? *Raihan Jamal* On Mon, Jul 9, 2012 at 10:51 PM, Vijay wrote: > hive has no built-in rank function. you'd need to use a user-defined > function (UDF) to simulate it. there are a few custom implementations > on the net that you can leverage. > > On Mon, Jul 9, 2012 at 10:40 PM

Re: Find TOP 10 using HiveQL

2012-07-09 Thread Raihan Jamal
2012-07-09 06:54:37 *Raihan Jamal* On Mon, Jul 9, 2012 at 7:56 PM, Andes wrote: > ** > hello, you can use "desc" and "limit 10" to filter the top 10. > > 2012-07-10 > -- > ** > Best Regards > Andes > > ** > -

Re: What's wrong with this query?

2012-07-09 Thread Raihan Jamal
Yup that worked for me. I figure that out after reading the docs, INNER JOIN means JOIN in HiveQL. *Raihan Jamal* On Mon, Jul 9, 2012 at 2:48 PM, Roberto Sanabria wrote: > Did you try just using "join" instead of "inner join"? > > > On Mon, Jul 9, 2012

What's wrong with this query?

2012-07-09 Thread Raihan Jamal
g ) in subquery source`* *Raihan Jamal*