Hi Cam,
I couldn't find a function that achieved precisely what you were looking
for, but there is a function that gets pretty close to what you want.
select id, collect_set(date_hour), collect_set(count), sum(count) from test
group by id;
The problem with using collect_set is that it removes du
Thanks for replying namit..
It is motivating to receive a mail from the authors of Hive :).
I filed the jira based on the discussion..
https://issues.apache.org/jira/browse/HIVE-1938
I will try to update my idea asap.
Thanks
Bharath,V
4th year Undergrad,IIIT Hyderabad.
w: http://research.iiit.a
Bharath,
This would be great.
Why donĀ¹t you write up something about how you are planning to proceed ?
File a new jira and load some design notes/spec. there.
We can definitely sync up. from there.
This feature would be very useful to the community - We, at facebook,
Would definitely like to us
In MapReduce, filenames that begin with an underscore are "hidden" files and
are not enumerated by FileInputFormat (Hive, I believe, processes tables
with TextInputFormat and SequenceFileInputFormat, both descendants of this
class).
Using "_foo" as a hidden/ignored filename is conventional in the
Hi Ning,Anja,
I am doing my Masters thesis on this topic . I have implemented all
SQL features like joins , selects etc on top of Hadoop (before knowing
about Hive) and we have derived some basic cost-models for join
re-ordering which seem to be working fine on some basic scales of TPCH
datasets .
I am new to hive and hadoop and I got the packaged version from Cloudera.
So, personally, I'd be happy if the new package is mutually consistent.
-Ajo
On Mon, Jan 31, 2011 at 5:14 PM, Carl Steinbach wrote:
> Hi,
>
> I'm trying to get an idea of how many people plan on running Hive
> 0.7.0 on to
Hi Anja,
As you noticed Hive only have limited supports for cost-baesd optimization. One
of the reasons is that Hive used to have very small number of optional
execution plans to choose from. One exception is mapjoin vs common joins.
Liying Tang had some work on his last intern to convert commo
Hi,
I'm trying to get an idea of how many people plan on running Hive
0.7.0 on top of Hadoop 0.20.0 (as opposed to 0.20.1 or 0.20.2),
and are in a position where they can't upgrade to one of more
recent releases of the 0.20.x branch. I'm asking because there is
a ticket open (HIVE-1817) that block
I think there is a developer mailing list ... that is probably the best
place for this question.
Also, I think there is a cost-based query optimizer in the works somewhere.
-Ajo
On Mon, Jan 31, 2011 at 2:04 PM, Anja Gruenheid
wrote:
> Hi!
>
> I'm a graduate student from Georgia Tech and I'm wor
Hi!
I'm a graduate student from Georgia Tech and I'm working with Hive for a
research project. I am interested in query optimization and the Hive
MetaStore in that context. Working through the documentation and code, I
noticed that the implementation right now is using a rule-based
optimizati
Hello,
After doing some aggregate counting, I now have data in a table like this:
id countdate_hour (this is a partition name)
1 3 2011310115
1 1 2011310116
2 1 2011310117
2 1 2011310118
and I need to turn this into:
1 [2011310115,2011310
thank you very much, exactly what i needed.
On Mon, Jan 31, 2011 at 9:57 PM, Adam O'Donnell wrote:
> I would create separate partitions, one for each day worth of data,
> and then drop the partitions that are no longer needed.
>
> On Mon, Jan 31, 2011 at 11:56 AM, Cam Bazz wrote:
>> Hello,
>>
>>
I would create separate partitions, one for each day worth of data,
and then drop the partitions that are no longer needed.
On Mon, Jan 31, 2011 at 11:56 AM, Cam Bazz wrote:
> Hello,
>
> I understand there is no way to delete data stored in a table. like a
> `delete from table_name` in the hive l
Hello,
I understand there is no way to delete data stored in a table. like a
`delete from table_name` in the hive language.
All this is fine, because all the query results are inserted into
another table, and the previous data in it is overwritten.
When we need to store data collectively in to a
Hello,
I have fixed the problem. I had to use not the last version of hadoop
but the previous release. hadoop-0.20.2 - with the hive-0.6.0-bin -
thank you very much. I already started importing my web logs and
making basic statistics on them.
-C.B.
On Mon, Jan 31, 2011 at 7:32 PM, Jean-Daniel
You can first try to set io.skip.checksum.errors to true, which will
ignore bad checksum.
>>In facebook, we also had a requirement to ignore corrupt/bad data - but it
>>has not been committed yet. Yongqiang, what is the jira number ?
there seems no jira for this issue.
thanks
yongqiang
2011/1/31
I've noticed that it takes a while for each map job to be set up in hive ...
and the way I set up the job I noticed that there were as many maps as
files/buckets.
I read a recommendation somewhere to design jobs such that they take at
least a minute.
Cheers,
-Ajo.
On Mon, Jan 31, 2011 at 8:08 AM
It seems the hadoop version you're running isn't the same as the one
that hive is using. Check the lib/ folder and if it's not the same,
replace the hadoop jars with the ones from the version you're running.
J-D
On Mon, Jan 31, 2011 at 6:29 AM, Cam Bazz wrote:
> Hello,
>
> I have written a probl
On 1/31/11 7:46 AM, "Laurent Laborde" wrote:
>On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde
>wrote:
>> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain wrote:
>>> Hi Laurent,
>>>
>>> 1. Are you saying that _top.sql did not exist in the home directory.
>>> Or that, _top.sql existed, but hive was
On Mon, Jan 31, 2011 at 11:08 AM, wrote:
> Hello,
>
> I like to do a reporting with Hive on something like tracking data.
> The raw data which is about 2 gigs or more a day I want to query with hive.
> This works already for me, no problem.
> Also I want to cascade down the reporting data to som
Hello,
I like to do a reporting with Hive on something like tracking data.
The raw data which is about 2 gigs or more a day I want to query with hive.
This works already for me, no problem.
Also I want to cascade down the reporting data to something like client, date,
something in Hive like part
On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde wrote:
> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain wrote:
>> Hi Laurent,
>>
>> 1. Are you saying that _top.sql did not exist in the home directory.
>> Or that, _top.sql existed, but hive was not able to read it after loading
>
> It exist, it's lo
Hello,
I have written a problem in my previous email. I now tried: select
item_view_raw.* from item_view_raw WHERE log_level = 'INFO';
and I get the same error. select * from item_view_raw works just fine,
but when i do a WHERE clause to any column I get the same exception:
Total MapReduce jobs
Hello,
I just started hive today. Following instructions I did set it up, and
made it work to play with my web server log files.
I created two tables:
CREATE TABLE item_view(view_time BIGINT, ip_number STRING, session_id
STRING, session_cookie STRING, referrer_url STRING, eser_sid INT,
sale_stat
24 matches
Mail list logo