Re: Huge join performance issue

2013-04-06 Thread Gabi D
ts > and you want to use transform functionality in hive. I have not used it a > lot so not sure on that part. also its helpful to write where clauses in > join statements to reduce the dataset you want to join. > > > > On Thu, Apr 4, 2013 at 5:53 PM, Gabi D wrote: > >>

Huge join performance issue

2013-04-04 Thread Gabi D
Hi all, I have two tables I need to join and then summarize. They are both huge (about 1B rows each, in the relevant partitions) and the query runs for over 2 hours creating 5T intermediate data. The current query looks like this: select t1.b,t1.c,t2.d,t2.e, count(*) from (select a,b,cfrom ta

Re: Book 'Programming Hive' from O'Reilly now available!

2012-09-29 Thread Gabi D
Congratulations Edward! first order (for Outbrain) already on the way :) Gabi On Sun, Sep 30, 2012 at 1:51 AM, Edward Capriolo wrote: > Hello all, > > I wanted to let you know that "Programming Hive" from O'Reilly is now > available! > > http://shop.oreilly.com/product/0636920023555.do > > I coul

Re: question on output hive table to file

2012-08-07 Thread Gabi D
haven't tried this but - since your myoutputtable table is tab delimited, and if this format suites your needs, you could create it as an external table and specify its hadoop path then run the getmerge command off of that location (without needing the 'insert overwrite directory ...' command, so

Re: Trouble with sum function

2012-06-11 Thread Gabi D
float is known to have precision issues, because of the way it is implemented. If you are working with money data you should definitely move to double. google 'float precision' and you'll find a bunch of explanations. On Mon, Jun 11, 2012 at 12:49 PM, Guillaume Polaert wrote: > Hi, > > We're expe

Re: add column to hive table

2012-05-01 Thread Gabi D
you probably noticed this already, but if you add a column in the middle and it did not exist in your older files then when you select from older dates you will get wrong values in the wrong columns since you will be looking at old files with the new format. Dangerous. We also went with the sqoop t

Re: Percentage of rows in a Hive Table

2012-03-28 Thread Gabi D
James, See if sampling is what you need On Wed, Mar 28, 2012 at 5:53 PM, James Newhaven wrote: > I am trying to write a query that will return the first 5% of rows in a > table. > > I've struggled with this for quite a wh

Re: Create Partitioned Table w/ Partition= Substring of Raw Data

2012-03-22 Thread Gabi D
ntelligence Analyst > OANDA Corporation > > www: oanda.com www: fxtrade.com > > "Best Trading Platform" - World Finance's Forex Awards 2009. > "The One to Watch" - Treasury Today's Adam Smith Awards 2009. > > > - Original Message - > F

Re: Create Partitioned Table w/ Partition= Substring of Raw Data

2012-03-22 Thread Gabi D
Dan, the partition value does not look at your raw data, you assign a value to the partition when you put the data in. So what you need to do is this: Create table mytable (Time string, OtherData string) Partition by (danDate string); (never a good idea to give fields a name that's a reserv

Re: LOAD DATA problem

2012-03-21 Thread Gabi D
nd return an exit > > code (what it used to do) > > Re-copy over the existing file (less preferable, but it would be a nice > if > > there was a flag to do this) > > > > > > For now as a hack I first check if the file already exists in hdfs > before I >

Re: LOAD DATA problem

2012-03-20 Thread Gabi D
nged behavior caught us off guard. > > > > I haven't found a solution in my sleuthing tonight. Indeed, any help > would > > be greatly appreciated on this! > > > > Sean > > > > From: Gabi D > > Reply-To: > > Date: Tue, 20 Mar 2012 10:03:0

Re: LOAD DATA problem

2012-03-20 Thread Gabi D
Hi Vikas, we are facing the same problem that Sean reported and have also noticed that this behavior changed with a newer version of hive. Previously, when you inserted a file with the same name into a partition/table, hive would fail the request (with yet another of its cryptic messages, an issue

Re: We cannot insert data to external table right?

2012-03-14 Thread Gabi D
I haven't tried external table with a none hdfs location. external with hdfs location works great. Question in this case - on which machine does hive look for the external location (as it is a clustered env)? maybe you are not looking at the right one ... On Wed, Mar 14, 2012 at 8:34 AM, Lu, Wei

Re: Doubt in INSERT query in Hive?

2012-02-15 Thread Gabi D
Hi Bhavesh, You could consider partitioning your table. Then every insert would be to a different partition, not overwriting the previous ones, and a select * would work on all partitions. Depending on your functionality, this might also help you with queries, identifying only data of a certain run