Sort-Merge-Bucket (SMB) joins and MAPJOINs are related, but separate...

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization

I think you should still be able to get map-side joins (if the join table is 
small enough)....

Nothing here that says sorting is required:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions

You need sorting (and bucketing) for the SMB join, but you aren't specifying 
bucketing.

I'd suggest either bucketing the data along with sorting it, or try it without 
'sorted by' and see if you can execute a mapjoin.


From: David Capwell [mailto:dcapw...@gmail.com]
Sent: Monday, August 03, 2015 11:59 AM
To: user@hive.apache.org
Subject: RE: External sorted tables


Mostly wanted to tell hive it's sorted so it could use more efficient joins 
like a map side join.  No other reason
On Aug 3, 2015 10:47 AM, "Ryan Harris" 
<ryan.har...@zionsbancorp.com<mailto:ryan.har...@zionsbancorp.com>> wrote:
Unless you are using bucketing and sampling, there is no benefit (that I can 
think of) to informing hive that the data *is* in fact sorted...

If there is something specific you are trying to accomplish by specifying the 
sort order of that column, perhaps you can elaborate on that.  Otherwise, leave 
out the 'sorted by' statement and you should be fine.

From: David Capwell [mailto:dcapw...@gmail.com<mailto:dcapw...@gmail.com>]
Sent: Monday, August 03, 2015 10:50 AM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: External sorted tables


Based off the ddl it is required to have buckets, I was wondering if there was 
a way to get around it?

Thinking as a hack I could try bucket=1, but if there is a better way would 
love to know
On Aug 2, 2015 6:18 PM, "Takahiko Saito" 
<tysa...@gmail.com<mailto:tysa...@gmail.com>> wrote:
Hi,

Is it possible that 'create table sorted by' must have buckets?

I found the below statements in 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL:

"The CLUSTERED BY and SORTED BY creation commands do not affect how data is 
inserted into a table – only how it is read. This means that users must be 
careful to insert data correctly by specifying the number of reducers to be 
equal to the number of buckets, and using CLUSTER BY and SORT BY commands in 
their query."

On Thu, Jul 30, 2015 at 7:22 PM, David Capwell 
<dcapw...@gmail.com<mailto:dcapw...@gmail.com>> wrote:
We are trying to create a external table in hive. This data is sorted,
so wanted to tell hive about this. When I do, it complains about
parsing the create.

> CREATE EXTERNAL TABLE IF NOT EXISTS store.testing (
...
. . . . . . . . . . . . . . . . . . .>   timestamp bigint,
...)
. . . . . . . . . . . . . . . . . . .>   SORTED BY (timestamp)
...
. . . . . . . . . . . . . . . . . . .>   LOCATION '/project/db/table'
. . . . . . . . . . . . . . . . . . .> ;
Error: Error while compiling statement: FAILED: ParseException line
1:507 missing EOF at 'SORTED' near ')' (state=42000,code=40000)
2: jdbc:hive2://localhost:10000/store>

What can I do to let hive know that my data is sorted? Every example
online of sorted by is grouped with buckets, but we really don't want
to add bucketing.


Hive version: 0.14.0

Thanks for your help!



--
Takahiko Saito
________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately. Thank you.

======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL 
and may contain information that is privileged and exempt from disclosure under 
applicable law. If you are neither the intended recipient nor responsible for 
delivering the message to the intended recipient, please note that any 
dissemination, distribution, copying or the taking of any action in reliance 
upon the message is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately.  Thank you.

Reply via email to