Well I use Spark as engine. Now the question is have you updated statistics on ORC table?
HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 21 March 2016 at 15:32, Tale Firefly <tale.h...@gmail.com> wrote: > Re. > > Ty ty for your answer. > > I'm using Tez as execution engine for this query. > And it launches a job to yarn. > > Do you know why it launches a job just for a select when I use Tez as > execution engine ? > > BR. > > Tale > > > On Mon, Mar 21, 2016 at 4:17 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi, >> >> Your query is a table level query that covers all rows in the table. >> >> Using ODBC you are connecting to Hive server 2 that runs on a given port. >> >> Depending on the version of Hive you are running Hive under the bonnet is >> most likely using Map-Reduce as the execution engine. >> >> Data has to be collected from all blocks that hold data for this table. >> The underlying ORC stats can only act at table level as there is no >> predicate push down and data has to be sent to ODBC driver through the >> network. >> >> The ODBC driver can only communicate with Hive server 2 so there is no >> connectivity to individual nodes from your client. >> >> So in summary Hive server 2 collects data from all blocks and forwards it >> to the client. The actual collection and filtering of result set in SQL >> query will depend on many factors. >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 21 March 2016 at 14:26, Tale Firefly <tale.h...@gmail.com> wrote: >> >>> Hello guys ! >>> >>> I'm trying to understand the mechanism for a simple query select * from >>> my_table when using HiveServer2. >>> >>> I'm using the hortonworks ODBC Driver for HiveServer2. >>> I just do a select * from my_table. >>> my_table is an ORC table based on files divised into blocks located on >>> all my datanodes. >>> I have 50 datanodes. >>> >>> My question is the following : >>> Does all the data go from the datanodes to the node hosting the >>> hiveserver2 before coming back to my client ? >>> Or does all the data go directly from the datanodes to my client ? >>> >>> Hope you can help me o/ >>> >>> Thank you >>> >>> Tale >>> >> >> >