Doing versioning would work for this scenario. It essentially achieves the
same thing.
On 9/11/09 2:39 AM, "Ashish Thusoo" wrote:
> Another option is to deal with this using versioning. Some ideas on this are
> at
>
> https://issues.apache.org/jira/browse/HIVE-718
>
> Ashish
> ___
I am seeing a very weird behavior. When the inner table is small (~17k rows
each with two columns (INT,DOUBLE)), the join executes in about 2 mins
(outer table is 500 million rows). While when the inner table is a bit
bigger (~480k rows (INT, DOUBLE)), the join takes about 20 mins for the same
oute
Thanks. I switched to branch 0.4, and the hash join is working, even though
it is running much slower that I expected. I will try to figure out the
reason.
PhD Candidate
CS @ UCSB
Santa Barbara, CA 93106, USA
http://www.cs.ucsb.edu/~sudipto
On Fri, Sep 11, 2009 at 3:13 PM, Namit Jain wrote:
>
I checked join26.q. That wasn't what I meant. My query looks something like
this:
insert overwrite table join_result
select /*+ MAPJOIN(c,m)*/ c.cid, m.mid, c.cparam, m.mparam, r.rate, (r.rate
- c.cparam*m.mparam)
from mparam m JOIN data r ON (m.mid = r.mid)
JOIN cparam c ON (c.cid = r.cid);
mpar
On Fri, Sep 11, 2009 at 5:28 PM, Todd Lipcon wrote:
> Hrm... sorry, I didn't read your original query closely enough.
>
> I'm not sure what could be causing this. The map.tasks.maximum parameter
> shouldn't affect it at all - it only affects the number of slots on the
> trackers.
>
> By any chance
Hrm... sorry, I didn't read your original query closely enough.
I'm not sure what could be causing this. The map.tasks.maximum parameter
shouldn't affect it at all - it only affects the number of slots on the
trackers.
By any chance do you have mapred.max.maps.per.node set? This is a
configuratio
That is not true - look at the unit test join26.q
From: Sudipto Das [mailto:sudipt...@gmail.com]
Sent: Friday, September 11, 2009 2:07 PM
To: hive-user@hadoop.apache.org
Subject: Re: Directing Hive to perform Hash Join for small inner tables
But this creates a join where each join is performed i
Todd -
Of course; it makes sense that it would be that way. But I'm still left
wondering why, then, my Hive queries are only using 2 mappers per task
tracker when other jobs use 7. I've gone so far as to diff the job.xml
files from a regular job and a Hive query, and didn't turn up anything -
th
Hi Brad,
mapred.tasktracker.map.tasks.maximum is a parameter read by the TaskTracker
when it starts up. It cannot be changed per-job.
Hope that helps
-Todd
On Fri, Sep 11, 2009 at 2:06 PM, Brad Heintz wrote:
> TIA if anyone can point me in the right direction on this.
>
> I'm running a simple
But this creates a join where each join is performed in a single Map only MR
join, which is as good as specifying the query as MAPJOIN(x) followed by
another query as MAPJOIN(y) with the result of the previous join. Is there
as way to make it pick just one MR job, where both the small inner tables
TIA if anyone can point me in the right direction on this.
I'm running a simple Hive query (a count on an external table comprising 436
files, each of ~2GB). The cluster's mapred-site.xml specifies
mapred.tasktracker.map.tasks.maximum = 7 - that is, 7 mappers per worker
node. When I run regular
I think we used r790732 the last time we made changes to the hive service
interface.
raghu
On 9/11/09 10:18 AM, "Bill Graham" wrote:
> +1
>
> I've been struggling with thrift versions as well, see:
> https://issues.apache.org/jira/browse/HIVE-795?focusedCommentId=12754020&page=
> com.atlassia
Yes, you can specify the list of tables in the hint
MAPJOIN(x,y,z)
From: Sudipto Das [mailto:sudipt...@gmail.com]
Sent: Friday, September 11, 2009 1:17 PM
To: hive-user@hadoop.apache.org
Subject: Re: Directing Hive to perform Hash Join for small inner tables
Is there any performance differe
A partitioned table has a set of partition keys. Check wiki on how to create
partitioned table. In your case you can one partition key named 'ds'
(datestamp). You can choose any format for values but commonly chosen one is
'-MM-DD'. You can specify the partition while loading data by '
PART
Is there any performance difference between 0.4 and trunk? I can then
temporarily switch to 0.4 while the problem is being fixed.
Also, can I specify two tables in the MAPJOIN hint when doing a 3 way join,
where two tables can fit in memory. I tried some intuitive alternatives but
did not work. Th
Prasad Chakka wrote:
You should create a daily partition table. So you just need to create
a new partition which is automatic if you use ‘LOAD DATA... INTO TABLE
... PARTITION (ds=’2009-09-01’)’
Prasad
Just wanted to clarify, I still need to do LOAD DATA .. INTO TABLE ..
PARTITION (day='hdfs
On Fri, Sep 11, 2009 at 3:26 PM, Prasad Chakka wrote:
> You should create a daily partition table. So you just need to create a new
> partition which is automatic if you use ‘LOAD DATA... INTO TABLE ...
> PARTITION (ds=’2009-09-01’)’
>
> Prasad
>
>
>
> From: Mayura
You should create a daily partition table. So you just need to create a new
partition which is automatic if you use 'LOAD DATA... INTO TABLE ... PARTITION
(ds='2009-09-01')'
Prasad
From: Mayuran Yogarajah
Reply-To:
Date: Fri, 11 Sep 2009 12:20:25 -0700
To:
S
We have our files in HDFS laid out by day like this:
2009-09-01/files
2009-09-02/files
2009-09-03/files
Loading this data into Hive would mean creating a new table per day!
I'm thinking this might be a common issue though, since others most likely
do batch processing on a daily/nightly basis.
This is some problem in trunk - it runs fine in 0.4 - I will take a look
From: Sudipto Das [mailto:sudipt...@gmail.com]
Sent: Thursday, September 10, 2009 6:16 PM
To: hive-user@hadoop.apache.org
Subject: Re: Directing Hive to perform Hash Join for small inner tables
Hi Namit,
The join column is
Thanks... I'll look into it and get back if I have any more questions.
PhD Candidate
CS @ UCSB
Santa Barbara, CA 93106, USA
http://www.cs.ucsb.edu/~sudipto
On Fri, Sep 11, 2009 at 11:17 AM, Raghu Murthy wrote:
> You will need to write a custom mapper script using a TRANSFORM clause. See
> http
You will need to write a custom mapper script using a TRANSFORM clause. See
http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform
On 9/11/09 11:04 AM, "Sudipto Das" wrote:
> Hi,
>
> I have a scenario where in a query, I need to split up a single row and output
> 3 different rows. Is this
Hi,
I have a scenario where in a query, I need to split up a single row and
output 3 different rows. Is this supported by Hive, and if yes, can someone
point me to the syntax?
Thanks
Sudipto
PhD Candidate
CS @ UCSB
Santa Barbara, CA 93106, USA
http://www.cs.ucsb.edu/~sudipto
+1
I've been struggling with thrift versions as well, see:
https://issues.apache.org/jira/browse/HIVE-795?focusedCommentId=12754020&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12754020
Any insight into which version of thrift the Hive trunk is using would be
hel
Another option is to deal with this using versioning. Some ideas on this are at
https://issues.apache.org/jira/browse/HIVE-718
Ashish
From: Eva Tse [e...@netflix.com]
Sent: Wednesday, September 09, 2009 10:45 PM
To: hive-user@hadoop.apache.org
Subject: Re:
Hi all,
we've tried newest one from trunk and r760184, both of them can't produce
the same code with hive trunk.
which thrift reversion do you use ?
Thanks,
Min
--
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.
My profile:
http://www.linke
What does explain show? An easy work around for now is to push the
partition predicate into a subquery on the table.
Ashish
From: Abhijit Pol [a...@rocketfuelinc.com]
Sent: Thursday, September 10, 2009 10:40 PM
To: hive-user@hadoop.apache.org
Subject: Re:
27 matches
Mail list logo