RE: Hive with Amazon EMR?

2009-05-06 Thread Ashish Thusoo
Actually Joydeep is working on it.. He might be able to provide more insights on this when he sees this (He is in India right now and of course he would see this only in the night due to the time difference). There were some technical snags that he had hit while trying this out. Some of them ha

Hive with Amazon EMR?

2009-05-06 Thread Stephen Corona
Does anyone know if it's possible to use Hive with Amazon Elastic MapReduce? -Steve

RE: Individual distinction on two or more columns

2009-05-06 Thread Ashish Thusoo
Just wanted to add here... One of the reasons we did not support this initially was because we were splitting a group by into two jobs with the first one generating partial counts on the distinct + group by keys. This was a better plan when there were data skews - which was one of the more comm

Re: Individual distinction on two or more columns

2009-05-06 Thread Alexis Rondeau
Namit and Prasad thank you for your fast responses. Your suggestion of combining two result sets makes sense, should have thought of that of course. My first impulse would be to somehow join those tables again, so I don't have to manually go into the files after issuing the queries. Well, since I

Re: Individual distinction on two or more columns

2009-05-06 Thread Prasad Chakka
Hive doesn't support distinct on more than one column. I don't think any one is working on that right now. The only work around I could think of is two issue separate queries Or you could try something like this from actions insert overwrite local directory 'a' select count(distinct user) i

RE: Individual distinction on two or more columns

2009-05-06 Thread Namit Jain
Right now, hive does not support multiple distinct in the same query - The only workaround would be to have 2 different queries and then combine the results manually. If you need it, can you file a jira ? We will try to look at it asap Thanks, -namit From: Alexis Rondeau [mailto:alexis.rond..

Individual distinction on two or more columns

2009-05-06 Thread Alexis Rondeau
Hi there, I'm currently getting my feet wet with Hive and am very impressed how quick and easy it was to get going and try things out. I am trying to run a query using count(distinct) on two separate columns is failing as follows: hive> select count(distinct user), count(distinct session) from a

Re: Tasks killed with "Filesystem closed"

2009-05-06 Thread Neil Conway
On Tue, May 5, 2009 at 10:21 PM, Prasad Chakka wrote: > Are these speculative execution maps? Yep, that looks correct. So presumably this isn't worth worrying about, then? Avoiding the ugly exception would be nice, at least. Neil