I am using Hive 1.2.0 on Hadoop 2.6 (on a cluster with 10 machines) and I
am trying to understand the performance of a full-table scan. I am running
the following query:

SELECT * FROM LINEITEM
WHERE L_LINENUMBER < 0;

and I am measuring its performance in different scenarios: using "MR vs.
Tez" and  with different table types/formats (an external table on text
data, or ORC).

My question is:
What is the best way to check the number of readers (scanners) that Hive
uses in parallel to read the data ?

My data is in HDFS and on each node I have 1 datanode process running which
writes its blocks into 3 separate paths (each path persists its data on a
separate disk).

I tried to get this info using "explain" or from the available consoles,
but I could not find that. Checking the number of established connections
to the data transfer port for datanode (using the command below) gives me
12, but I am not sure If I am looking at the correct metric:

netstat -anp | grep -w 50010 | grep ESTABLISHED | wc -l


Any help would be appreciated.

Thnx

Reply via email to