The Hive history file contains the job id and other job run-time info. Not sure
if there’s API on top of it or not.
From: Felix.徐 [mailto:ygnhz...@gmail.com]
Sent: Tuesday, March 20, 2012 12:14 AM
To: user@hive.apache.org; manishbh...@rocketmail.com
Subject: Re: How to get job names and stages of
The syntax would be 'LOAD DATA [IF NOT EXISTS] INFILE' . Is a good suggestion.
In hindsight it would have been add new syntax for the renaming files
feature rather then changing the current behaviour. Although the
change of behaviour sucks for you (and I am sorry about that), I
believe the new bet
> Still, what I think Sean is asking for, as well as am I, is the option to
> tell Hive to reject duplicate files altogether
Exactly this.
I would expect the default behavior of LOAD DATA LOCAL INPATH to either:
* Throw an error if the file already exists in hive/hdfs and return an exit
c
Hi Edward,
thanks for looking into this.
what fix 2296 does is not so good. It kind of messes with my filename, so
better concatenate it as *.*copy_n.gz (rahter than
*_*copy_n.gz)
but that request might be considered petty...
Still, what I think Sean is asking for, as well as am I, is the option to
Hi folks,
I have several questions about optimization in Hive, they are mainly related to
bucketized/sorted tables.
Let say I have a table T bucketized on user_id and sorted by user_id, time.
CREATE TABLE T
( user_id BIGINT,
time INT
)
CLUSTERED BY(user_id) SORTED BY(user_id, time) INTO 64 BUC
The copy_n should have been fixed in 0.8.0
https://issues.apache.org/jira/browse/HIVE-2296
On Tue, Mar 20, 2012 at 4:12 AM, Sean McNamara
wrote:
> Gabi-
>
> Glad to know I'm not the only one scratching my head on this one! The
> changed behavior caught us off guard.
>
> I haven't found a soluti
By now you all have realized that the load file semantics have
changed. I can not find the exact issue but here is a related change.
* [HIVE-306] - Support "INSERT [INTO] destination"
I do not see a way out of this without code. Maybe you could code up a
hive query hook for this.
It defiantl
I said it wrong: what really bothers me is not 500MB of RAM usage - it's
that mapper starting as 70-200Mb happy chimp becomes 500MB-600MB
bad-smelling gorilla. And that's on a simplest query! As far as I
understand Hive source code UDF length and UDAF max are super careful with
memory allocations.
Hi Alex
In good clusters you have the child task JVM size as 1.5 or 2GB (or at
least 1G). IMHO, 500MB for a task is a pretty normal memory consumption.
Now for 50G of data you are having just 7 mappers, need to increase the number
of mappers for better parallelism.
Regards
Bejoy
___
Hiya,
I'm using HIVE 0.7.1 with
1) moderate 50GB table, let's call it `temp_view`
2) query: select max(length(get_json_object(json, '$.user_id'))) from
temp_view. From my point of view this query is a total joke, nothing
serious.
Query runs just fine, everyone's happy.
But I have massive memory
Thanks Bejoy! That helps.
On Tue, Mar 20, 2012 at 12:10 AM, Bejoy Ks wrote:
> Hi Bruce
> From my understanding, that formula is not for
> CombineFileInputFormat but for other basic Input Formats.
>
> I'd just brief you on CombineFileInputFormat to get things more clear.
> In the defa
Gabi-
Glad to know I'm not the only one scratching my head on this one! The changed
behavior caught us off guard.
I haven't found a solution in my sleuthing tonight. Indeed, any help would be
greatly appreciated on this!
Sean
From: Gabi D mailto:gabi...@gmail.com>>
Reply-To: mailto:user@hiv
Hi Vikas,
we are facing the same problem that Sean reported and have also noticed
that this behavior changed with a newer version of hive. Previously, when
you inserted a file with the same name into a partition/table, hive would
fail the request (with yet another of its cryptic messages, an issue
hey Sean,
its becoz you are appending the file in same partition with the same
name(which is not possible) you must change the file name before appending
into same partition.
AFAIK, i don't think that there is any other way to do that, either you can
you partition name or the file name.
Thanks
V
I actually want to get the job name of stages by api..
在 2012年3月20日 下午2:23,Manish Bhoge 写道:
> **
> Whenever you submit a Sql a job I'd get generated. You can open the job
> tracker localhost:50030/jobtracker.asp
> It shows jobs are running and rest of the other details.
> Thanks,
> Manish
> Sent
15 matches
Mail list logo