RE: How to get job names and stages of a query?

2012-03-20 Thread Steven Wong
The Hive history file contains the job id and other job run-time info. Not sure if there’s API on top of it or not. From: Felix.徐 [mailto:ygnhz...@gmail.com] Sent: Tuesday, March 20, 2012 12:14 AM To: user@hive.apache.org; manishbh...@rocketmail.com Subject: Re: How to get job names and stages of

Re: LOAD DATA problem

2012-03-20 Thread Edward Capriolo
The syntax would be 'LOAD DATA [IF NOT EXISTS] INFILE' . Is a good suggestion. In hindsight it would have been add new syntax for the renaming files feature rather then changing the current behaviour. Although the change of behaviour sucks for you (and I am sorry about that), I believe the new bet

Re: LOAD DATA problem

2012-03-20 Thread Sean McNamara
> Still, what I think Sean is asking for, as well as am I, is the option to > tell Hive to reject duplicate files altogether Exactly this. I would expect the default behavior of LOAD DATA LOCAL INPATH to either: * Throw an error if the file already exists in hive/hdfs and return an exit c

Re: LOAD DATA problem

2012-03-20 Thread Gabi D
Hi Edward, thanks for looking into this. what fix 2296 does is not so good. It kind of messes with my filename, so better concatenate it as *.*copy_n.gz (rahter than *_*copy_n.gz) but that request might be considered petty... Still, what I think Sean is asking for, as well as am I, is the option to

Optimization on bucketized/sorted tables

2012-03-20 Thread mdefoinplatel.ext
Hi folks, I have several questions about optimization in Hive, they are mainly related to bucketized/sorted tables. Let say I have a table T bucketized on user_id and sorted by user_id, time. CREATE TABLE T ( user_id BIGINT, time INT ) CLUSTERED BY(user_id) SORTED BY(user_id, time) INTO 64 BUC

Re: LOAD DATA problem

2012-03-20 Thread Edward Capriolo
The copy_n should have been fixed in 0.8.0 https://issues.apache.org/jira/browse/HIVE-2296 On Tue, Mar 20, 2012 at 4:12 AM, Sean McNamara wrote: > Gabi- > > Glad to know I'm not the only one scratching my head on this one!  The > changed behavior caught us off guard. > > I haven't found a soluti

Re: LOAD DATA problem

2012-03-20 Thread Edward Capriolo
By now you all have realized that the load file semantics have changed. I can not find the exact issue but here is a related change. * [HIVE-306] - Support "INSERT [INTO] destination" I do not see a way out of this without code. Maybe you could code up a hive query hook for this. It defiantl

Re: HIVE mappers eat a lot of RAM

2012-03-20 Thread Alexander Ershov
I said it wrong: what really bothers me is not 500MB of RAM usage - it's that mapper starting as 70-200Mb happy chimp becomes 500MB-600MB bad-smelling gorilla. And that's on a simplest query! As far as I understand Hive source code UDF length and UDAF max are super careful with memory allocations.

Re: HIVE mappers eat a lot of RAM

2012-03-20 Thread Bejoy Ks
Hi Alex       In good clusters you have the child task JVM size as 1.5 or  2GB (or at least 1G). IMHO, 500MB for a task is a pretty normal memory consumption. Now for 50G of data you are having just 7 mappers, need to increase the number of mappers for better parallelism. Regards Bejoy ___

HIVE mappers eat a lot of RAM

2012-03-20 Thread Alexander Ershov
Hiya, I'm using HIVE 0.7.1 with 1) moderate 50GB table, let's call it `temp_view` 2) query: select max(length(get_json_object(json, '$.user_id'))) from temp_view. From my point of view this query is a total joke, nothing serious. Query runs just fine, everyone's happy. But I have massive memory

Re: how is number of mappers determined in mapside join?

2012-03-20 Thread Bruce Bian
Thanks Bejoy! That helps. On Tue, Mar 20, 2012 at 12:10 AM, Bejoy Ks wrote: > Hi Bruce > From my understanding, that formula is not for > CombineFileInputFormat but for other basic Input Formats. > > I'd just brief you on CombineFileInputFormat to get things more clear. > In the defa

Re: LOAD DATA problem

2012-03-20 Thread Sean McNamara
Gabi- Glad to know I'm not the only one scratching my head on this one! The changed behavior caught us off guard. I haven't found a solution in my sleuthing tonight. Indeed, any help would be greatly appreciated on this! Sean From: Gabi D mailto:gabi...@gmail.com>> Reply-To: mailto:user@hiv

Re: LOAD DATA problem

2012-03-20 Thread Gabi D
Hi Vikas, we are facing the same problem that Sean reported and have also noticed that this behavior changed with a newer version of hive. Previously, when you inserted a file with the same name into a partition/table, hive would fail the request (with yet another of its cryptic messages, an issue

Re: LOAD DATA problem

2012-03-20 Thread hadoop hive
hey Sean, its becoz you are appending the file in same partition with the same name(which is not possible) you must change the file name before appending into same partition. AFAIK, i don't think that there is any other way to do that, either you can you partition name or the file name. Thanks V

Re: How to get job names and stages of a query?

2012-03-20 Thread Felix . 徐
I actually want to get the job name of stages by api.. 在 2012年3月20日 下午2:23,Manish Bhoge 写道: > ** > Whenever you submit a Sql a job I'd get generated. You can open the job > tracker localhost:50030/jobtracker.asp > It shows jobs are running and rest of the other details. > Thanks, > Manish > Sent