And thanks very much Andries for the detailed information answering to my
questions. I really appreciated it.
And the tables are stored in HDFS on the EMR cluster, not on S3, and then
loaded into Hive as External tables.
Thanks,
Alex
On Tue, Apr 21, 2015 at 3:08 PM, Andries Engelbrecht <
aeng
Alex,
Definitely looks like the majority of time is by far spend on reading the Hive
data (Hive_Sub_Scan). Not sure how well the storage environment is configured,
and it may very likely be that the nodes are just waiting on storage IO. The
more nodes will simply just wait longer to actually ge
Sorry about the inconvenience. The Web UI output is printed in a PDF file
here:
https://drive.google.com/file/d/0B24zVBhi8pQ3aDRfRllFVUh2eEE/view?usp=sharing
Thanks,
Alex
On Tue, Apr 21, 2015 at 2:40 PM, Jason Altekruse
wrote:
> The attachment for the json profile made it to the list because i
The attachment for the json profile made it to the list because it is
ASCII, but the screenprint was blocked as a binary file. We can take a look
at the profile by loading the json into an instance of Drill, but just a
reminder about binary attachments for everyone, please upload to a public
host a
Hi Team Drill!
While performing performance testing on Drill clusters on AWS EMR, with
TPC-H data of scale factor 100, I observed the results for a cluster of 3
nodes are similar to a cluster of 13 nodes. Hence, I am investigating how
the query is being carried out and which part of the query ha