Thanks, so only optimization for sorted data is on buckets, so without
buckets there isn't really a reason to tell hive the data is sorted.
Thanks for letting me know
On Aug 3, 2015 11:11 AM, "Ryan Harris" <[email protected]> wrote:

> Sort-Merge-Bucket (SMB) joins and MAPJOINs are related, but separate...
>
>
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization
>
>
>
> I think you should still be able to get map-side joins (if the join table
> is small enough)....
>
>
>
> Nothing here that says sorting is required:
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-MapJoinRestrictions
>
>
>
> You need sorting (and bucketing) for the SMB join, but you aren't
> specifying bucketing.
>
>
>
> I'd suggest either bucketing the data along with sorting it, or try it
> without 'sorted by' and see if you can execute a mapjoin.
>
>
>
>
>
> *From:* David Capwell [mailto:[email protected]]
> *Sent:* Monday, August 03, 2015 11:59 AM
> *To:* [email protected]
> *Subject:* RE: External sorted tables
>
>
>
> Mostly wanted to tell hive it's sorted so it could use more efficient
> joins like a map side join.  No other reason
>
> On Aug 3, 2015 10:47 AM, "Ryan Harris" <[email protected]>
> wrote:
>
> Unless you are using bucketing and sampling, there is no benefit (that I
> can think of) to informing hive that the data **is** in fact sorted...
>
>
>
> If there is something specific you are trying to accomplish by specifying
> the sort order of that column, perhaps you can elaborate on that.
> Otherwise, leave out the 'sorted by' statement and you should be fine.
>
>
>
> *From:* David Capwell [mailto:[email protected]]
> *Sent:* Monday, August 03, 2015 10:50 AM
> *To:* [email protected]
> *Subject:* Re: External sorted tables
>
>
>
> Based off the ddl it is required to have buckets, I was wondering if there
> was a way to get around it?
>
> Thinking as a hack I could try bucket=1, but if there is a better way
> would love to know
>
> On Aug 2, 2015 6:18 PM, "Takahiko Saito" <[email protected]> wrote:
>
> Hi,
>
>
>
> Is it possible that 'create table sorted by' must have buckets?
>
>
>
> I found the below statements in
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL:
>
>
>
> "The CLUSTERED BY and SORTED BY creation commands do not affect how data
> is inserted into a table – only how it is read. This means that users must
> be careful to insert data correctly by specifying the number of reducers to
> be equal to the number of buckets, and using CLUSTER BY and SORT BY
> commands in their query."
>
>
>
> On Thu, Jul 30, 2015 at 7:22 PM, David Capwell <[email protected]> wrote:
>
> We are trying to create a external table in hive. This data is sorted,
> so wanted to tell hive about this. When I do, it complains about
> parsing the create.
>
> > CREATE EXTERNAL TABLE IF NOT EXISTS store.testing (
> ...
> . . . . . . . . . . . . . . . . . . .>   timestamp bigint,
> ...)
> . . . . . . . . . . . . . . . . . . .>   SORTED BY (timestamp)
> ...
> . . . . . . . . . . . . . . . . . . .>   LOCATION '/project/db/table'
> . . . . . . . . . . . . . . . . . . .> ;
> Error: Error while compiling statement: FAILED: ParseException line
> 1:507 missing EOF at 'SORTED' near ')' (state=42000,code=40000)
> 2: jdbc:hive2://localhost:10000/store>
>
> What can I do to let hive know that my data is sorted? Every example
> online of sorted by is grouped with buckets, but we really don't want
> to add bucketing.
>
>
> Hive version: 0.14.0
>
> Thanks for your help!
>
>
>
>
>
> --
>
> Takahiko Saito
> ------------------------------
>
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
> ------------------------------
> THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS
> CONFIDENTIAL and may contain information that is privileged and exempt from
> disclosure under applicable law. If you are neither the intended recipient
> nor responsible for delivering the message to the intended recipient,
> please note that any dissemination, distribution, copying or the taking of
> any action in reliance upon the message is strictly prohibited. If you have
> received this communication in error, please notify the sender immediately.
> Thank you.
>

Reply via email to