Re: Mechanism when doing a select *

Mich Talebzadeh Mon, 21 Mar 2016 09:25:48 -0700

You are correct. it  should not. There is nothing to optimise here.

0: jdbc:hive2://rhes564:10010/default>
*select * from countries;*OK
INFO  : Compiling
command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318):
select * from countries
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:countries.country_id, type:double,
comment:null), FieldSchema(name:countries.country_iso_code, type:string,
comment:null), FieldSchema(name:countries.country_name, type:string,
comment:null), FieldSchema(name:countries.country_subregion, type:string,
comment:null), FieldSchema(name:countries.country_subregion_id,
type:double, comment:null), FieldSchema(name:countries.country_region,
type:string, comment:null), FieldSchema(name:countries.country_region_id,
type:double, comment:null), FieldSchema(name:countries.country_total,
type:string, comment:null), FieldSchema(name:countries.country_total_id,
type:double, comment:null), FieldSchema(name:countries.country_name_hist,
type:string, comment:null)], properties:null)
INFO  : Completed compiling
command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318);
Time taken: 0.047 seconds
INFO  : Executing
command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318):
select * from countries
INFO  : Completed executing
command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318);
Time taken: 0.001 seconds
INFO  : OK


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 21 March 2016 at 15:56, Tale Firefly <tale.h...@gmail.com> wrote:

> Hm, I need to check if statistics are enabled for this table and
> up-to-date.
> I'm going to check this.
>
> I don't know if I was clear in my previous statement, but I am surprised
> that a job is launched just by doing a select * from my_table.
> I thought a select * from my_table was not running any MR jobs.
>
> Best regards.
>
> Tale.
>
> On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Well I use Spark as engine.
>>
>> Now the question is have you updated statistics on ORC table?
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote:
>>
>>> Re.
>>>
>>> Ty ty for your answer.
>>>
>>> I'm using Tez as execution engine for this query.
>>> And it launches a job to yarn.
>>>
>>> Do you know why it launches a job just for a select when I use Tez as
>>> execution engine ?
>>>
>>> BR.
>>>
>>> Tale
>>>
>>>
>>> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Your query is a table level query  that covers all rows in the table.
>>>>
>>>> Using ODBC you are connecting to Hive server 2 that runs on a given
>>>> port.
>>>>
>>>> Depending on the version of Hive you are running Hive under the
>>>> bonnet is most likely using Map-Reduce as the execution engine.
>>>>
>>>> Data has to be collected from all blocks that hold data for this table.
>>>> The underlying ORC stats can only act at table level as there is no
>>>> predicate push down and data has to be sent to ODBC driver through the
>>>> network.
>>>>
>>>> The ODBC driver can only communicate with Hive server 2 so there is no
>>>> connectivity to individual nodes from your client.
>>>>
>>>> So in summary Hive server 2 collects data from all blocks and forwards
>>>> it to the client. The actual collection and filtering of result set in SQL
>>>> query will depend on many factors.
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote:
>>>>
>>>>> Hello guys !
>>>>>
>>>>> I'm trying to understand the mechanism for a simple query select *
>>>>> from my_table when using HiveServer2.
>>>>>
>>>>> I'm using the hortonworks ODBC Driver for HiveServer2.
>>>>> I just do a select * from my_table.
>>>>> my_table is an ORC table based on files divised into blocks located on
>>>>> all my datanodes.
>>>>> I have 50 datanodes.
>>>>>
>>>>> My question is the following :
>>>>> Does all the data go from the datanodes to the node hosting the
>>>>> hiveserver2 before coming back to my client ?
>>>>> Or does all the data go directly from the datanodes to my client ?
>>>>>
>>>>> Hope you can help me o/
>>>>>
>>>>> Thank you
>>>>>
>>>>> Tale
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Mechanism when doing a select *

Reply via email to