My target is to perform a SELECT query using Hive

When I have a small data on a single machine (namenode), I start by:
1-Creating a table that contains this data: create table table1 (int col1, 
string col2)
2-Loading the data from a file path: load data local inpath 'path' into table 
table1;
3-Perform my SELECT query: select * from table1 where col1>0

I have huge data, of 10 millions rows that doesn't fit into a single machine. 
Lets assume Hadoop divided my data into for example 10 datanodes and each 
datanode contains 1 million row.

Retrieving the data to a single computer is impossible due to its huge size or 
would take alot of time in case it is possible.

Will Hive create a table at each datanode and perform the SELECT query
or will Hive move all the data a one location (datanode) and create one table? 
(which is inefficient)
*******************************

This e-mail contains information for the intended recipient only. It may 
contain proprietary material or confidential information. If you are not the 
intended recipient you are not authorised to distribute, copy or use this 
e-mail or any attachment to it. Murex cannot guarantee that it is virus free 
and accepts no responsibility for any loss or damage arising from its use. If 
you have received this e-mail in error please notify immediately the sender and 
delete the original email received, any attachments and all copies from your 
system.

Reply via email to