date:20140320

Re: Improving self join time

2014-03-20 Thread Stephen Sprague

so that's your final assessment, eh? :) What is your comment about the outer query _joining on value_ to get the key? On Thu, Mar 20, 2014 at 12:26 PM, Jeff Storey wrote: > I don't think so since the inner result doesn't have the key field in it. > It ends up being > > select key from (query

Re: HiveThrift Service Issue

2014-03-20 Thread Szehon Ho

The last line seems to indicate a PrivilegedActionException, maybe you can look more at the rest of the stack if any to see why. On Thu, Mar 20, 2014 at 1:59 PM, Raj Hadoop wrote: > Hi Szehon, > > It is not showing on the http://xyzserver:50030/jobtracker.jsp. > > > *I checked this log. and thi

Indexing in Hive 0.12 on a partitioned and bucketed table

2014-03-20 Thread Sagar Mehta

Hi Guys, We have a Hive 0.12 ORC table that is partitioned on year, month, day, hour and is bucketed by one column. So far so good - We are seeing good speed up improvements as compared to non-ORC format. - Now we want to add an index on another commonly used column. My question was - Give

Re: HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop

Hi Szehon, It is not showing on the http://xyzserver:50030/jobtracker.jsp. I checked this log. and this shows as - /tmp/root/hive.log exec.ExecDriver (ExecDriver.java:addInputPaths(853)) - Processing alias table_emp exec.ExecDriver (ExecDriver.java:addInputPaths(871)) - Adding input file

Re: HiveThrift Service Issue

2014-03-20 Thread Szehon Ho

Hi Raj, There are map-reduce job logs generated if the MapRedTask fails, those might give some clue. Thanks, Szehon On Thu, Mar 20, 2014 at 12:29 PM, Raj Hadoop wrote: > I am struggling on this one. Can any one throw some pointers on how to > troubelshoot this issue please? > > > On Thursda

Re: HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop

I am struggling on this one. Can any one throw some pointers on how to troubelshoot this issue please? On Thursday, March 20, 2014 3:09 PM, Raj Hadoop wrote: Hello everyone, The HiveThrift Service was started succesfully. netstat -nl | grep 1 tcp 0 0 0.0.0.0:1

Re: Improving self join time

2014-03-20 Thread Jeff Storey

I don't think so since the inner result doesn't have the key field in it. It ends up being select key from (query result that doesn't contain the key field) ... On Thu, Mar 20, 2014 at 1:28 PM, Stephen Sprague wrote: > I agree with your assessment of the inner query. why stop there though? > D

HiveThrift Service Issue

2014-03-20 Thread Raj Hadoop

Hello everyone, The HiveThrift Service was started succesfully. netstat -nl | grep 1 tcp 0 0 0.0.0.0:1 0.0.0.0:* LISTEN I am able to read tables from Hive through Tableau. When executing queries through Tableau I am getting the followi

Re: Improving self join time

2014-03-20 Thread Stephen Sprague

I agree with your assessment of the inner query. why stop there though? Doesn't the outer query fetch the ids of the tags that the inner query identified? On Thu, Mar 20, 2014 at 9:54 AM, Jeff Storey wrote: > I don't think this quite fits here..I think the inner query will give me a > list of

Re: Improving self join time

2014-03-20 Thread Jeff Storey

I don't think this quite fits here..I think the inner query will give me a list of duplicate elements and their counts, but it loses the information as to what id had these elements. I'm trying to find which pairs of ids have any duplicate tags. On Thu, Mar 20, 2014 at 11:57 AM, Stephen Sprague

Re: Improving self join time

2014-03-20 Thread Stephen Sprague

hmm. would this not fall under the general problem of identifying duplicates? Would something like this meet your needs? (untested) select -- outer query finds the ids for the duplicates key from ( -- inner query lists duplicate values select count(*) as cnt, value

Re: computing median and percentiles

2014-03-20 Thread Stephen Sprague

the short answer is there is no native hive UDF that solves your unique case. That means you have to solve it. i searched for something like you were looking for myself and found this general recipe: http://www.onlinemathlearning.com/median-frequency-table.html off the top of my head i'm not s

Improving self join time

2014-03-20 Thread Jeff Storey

I have a table with 10 million rows and 2 columns - id (int) and element (string). I am trying to do a self join that finds any ids where the element values are the same, and my query looks like: select e1.id, e1.tag, e2.id as id2, e2.tag as tag2 from elements e1 JOIN elements e2 on e1.element = e

Re: Improving self join time

Re: HiveThrift Service Issue

Indexing in Hive 0.12 on a partitioned and bucketed table

Re: HiveThrift Service Issue

Re: HiveThrift Service Issue

Re: HiveThrift Service Issue

Re: Improving self join time

HiveThrift Service Issue

Re: Improving self join time

Re: Improving self join time

Re: Improving self join time

Re: computing median and percentiles

Improving self join time

13 matches

Site Navigation

Mail list logo

Footer information