You are correct. it should not. There is nothing to optimise here. 0: jdbc:hive2://rhes564:10010/default> *select * from countries;*OK INFO : Compiling command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318): select * from countries INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:countries.country_id, type:double, comment:null), FieldSchema(name:countries.country_iso_code, type:string, comment:null), FieldSchema(name:countries.country_name, type:string, comment:null), FieldSchema(name:countries.country_subregion, type:string, comment:null), FieldSchema(name:countries.country_subregion_id, type:double, comment:null), FieldSchema(name:countries.country_region, type:string, comment:null), FieldSchema(name:countries.country_region_id, type:double, comment:null), FieldSchema(name:countries.country_total, type:string, comment:null), FieldSchema(name:countries.country_total_id, type:double, comment:null), FieldSchema(name:countries.country_name_hist, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318); Time taken: 0.047 seconds INFO : Executing command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318): select * from countries INFO : Completed executing command(queryId=hduser_20160321162726_7efeecbb-46ee-431f-9095-f67e0602b318); Time taken: 0.001 seconds INFO : OK
Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 21 March 2016 at 15:56, Tale Firefly <tale.h...@gmail.com> wrote: > Hm, I need to check if statistics are enabled for this table and > up-to-date. > I'm going to check this. > > I don't know if I was clear in my previous statement, but I am surprised > that a job is launched just by doing a select * from my_table. > I thought a select * from my_table was not running any MR jobs. > > Best regards. > > Tale. > > On Mon, Mar 21, 2016 at 4:48 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Well I use Spark as engine. >> >> Now the question is have you updated statistics on ORC table? >> >> HTH >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote: >> >>> Re. >>> >>> Ty ty for your answer. >>> >>> I'm using Tez as execution engine for this query. >>> And it launches a job to yarn. >>> >>> Do you know why it launches a job just for a select when I use Tez as >>> execution engine ? >>> >>> BR. >>> >>> Tale >>> >>> >>> On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Your query is a table level query that covers all rows in the table. >>>> >>>> Using ODBC you are connecting to Hive server 2 that runs on a given >>>> port. >>>> >>>> Depending on the version of Hive you are running Hive under the >>>> bonnet is most likely using Map-Reduce as the execution engine. >>>> >>>> Data has to be collected from all blocks that hold data for this table. >>>> The underlying ORC stats can only act at table level as there is no >>>> predicate push down and data has to be sent to ODBC driver through the >>>> network. >>>> >>>> The ODBC driver can only communicate with Hive server 2 so there is no >>>> connectivity to individual nodes from your client. >>>> >>>> So in summary Hive server 2 collects data from all blocks and forwards >>>> it to the client. The actual collection and filtering of result set in SQL >>>> query will depend on many factors. >>>> >>>> HTH >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote: >>>> >>>>> Hello guys ! >>>>> >>>>> I'm trying to understand the mechanism for a simple query select * >>>>> from my_table when using HiveServer2. >>>>> >>>>> I'm using the hortonworks ODBC Driver for HiveServer2. >>>>> I just do a select * from my_table. >>>>> my_table is an ORC table based on files divised into blocks located on >>>>> all my datanodes. >>>>> I have 50 datanodes. >>>>> >>>>> My question is the following : >>>>> Does all the data go from the datanodes to the node hosting the >>>>> hiveserver2 before coming back to my client ? >>>>> Or does all the data go directly from the datanodes to my client ? >>>>> >>>>> Hope you can help me o/ >>>>> >>>>> Thank you >>>>> >>>>> Tale >>>>> >>>> >>>> >>> >> >