I am using Hive 1.2.0 on Hadoop 2.6 (on a cluster with 10 machines) and I am trying to understand the performance of a full-table scan. I am running the following query:
SELECT * FROM LINEITEM WHERE L_LINENUMBER < 0; and I am measuring its performance in different scenarios: using "MR vs. Tez" and with different table types/formats (an external table on text data, or ORC). My question is: What is the best way to check the number of readers (scanners) that Hive uses in parallel to read the data ? My data is in HDFS and on each node I have 1 datanode process running which writes its blocks into 3 separate paths (each path persists its data on a separate disk). I tried to get this info using "explain" or from the available consoles, but I could not find that. Checking the number of established connections to the data transfer port for datanode (using the command below) gives me 12, but I am not sure If I am looking at the correct metric: netstat -anp | grep -w 50010 | grep ESTABLISHED | wc -l Any help would be appreciated. Thnx