Re: ORC tables loading

Alan Gates Tue, 17 Nov 2015 09:39:39 -0800

The reads and writes both happen in parallel, so as more nodes areavailable for read and write, at least in this case, the time staysroughly the same.


Alan.

James Pirz <mailto:james.p...@gmail.com>
November 16, 2015 at 21:23
Hi,

I am using Hive 1.2 with ORC tables on Hadoop 2.6 on a cluster.
I load data into an ORC table by reading the data from an externaltable on raw text files and using insert statement:
INSERT into TABLE myorctab SELECT * FROM mytxttab;
I ran a simple scale-up test to find out how the loading timeincreases as I double the size of data and nodes. I realized that thetotal time remains more or less the same (scales properly).
I am just wondering why this is happening, as naively I think if Imake the number of partitions and size of data double, the time shouldalso be roughly double as the system needs to partition twice amountof data as it was doing before among twice number of partitions. Am Imissing something here ?
Thnx

Re: ORC tables loading

Reply via email to