Re: Load data

2010-01-21 Thread Chris Bates
You might want to take a look through the Hadoop Wiki site and browse their various tutorials. In addition, you can also follow Cloudera's wonderful tutorials if you download their virtual machine: http://www.cloudera.com/hadoop-training-virtual-machine On Thu, Jan 21, 2010 at 9:45 AM, ankit bhat

Re: Partitioning from a single input

2009-11-21 Thread Chris Bates
A ticket has just been opened to add this functionality. You should vote it up! https://issues.apache.org/jira/browse/HIVE-936 On Sat, Nov 21, 2009 at 12:14 AM, Andrew O'Brien wrote: > Hi everyone, > > A question about partitioning: All of the examples I've seen insert > into a single hard-code

Please vote for dynamic partitions! HIVE-936

2009-11-17 Thread Chris Bates
Hey all, A new ticket has just been created for Hive to support dynamic partitions. https://issues.apache.org/jira/browse/HIVE-936 Register and vote for it to make it a priority. I think this feature would make Hive (particularly partitions) vastly more useful. >From the description: If a Hive

Re: Very basic (and almost certainly flawed) introduction to Hive on Amazon Elastic Map Reduce

2009-11-13 Thread Chris Bates
Hey David, Thanks for contributing. Pete Skomoroch also does a lot of great work with Hadoop on Amazon Web Services. You can check out his great tutorials at DataWrangling.com and the Git repository for TrendingTopics.org. The more we get people to blog about these tools the better! Hopefully

Hadoop Hardware Inquiry

2009-11-05 Thread Chris Bates
This is slightly off-topic, but our Hadoop + Hive usage is growing at our company and we're feeling the need to start adding more hardware. I've been tasked with trying to figure out what other groups use. I haven't really followed up on what hardware is out there mostly because my needs have bee

Re: Issues with joining across large tables

2009-10-26 Thread Chris Bates
Ryan, I asked this question a couple days ago but in a slightly different form. What you have to do is make sure the table you're joining is smaller than the leftmost table. As an example, SELECT COUNT(DISTINCT UT.UserID) FROM usertracking UT JOIN streamtransfers ST ON (ST.usertrackingid = UT.u

Out of Memory Problems

2009-10-20 Thread Chris Bates
Hi all, I'm trying to run this query on two 8gb datasets: SELECT COUNT(UT.UserID) FROM streamtransfers ST JOIN usertracking UT ON (ST.usertrackingid = UT.usertrackingid) WHERE UT.UserID IS NOT NULL AND UT.UserID <> 0 GROUP BY UT.UserID; I've also tried its DISTINCT counterpart. Hive-0.4.0 on Hado