Partition scanning by one Drill bit node

Ranjit Reddy Sun, 05 May 2024 19:13:59 -0700

Hi all,
I'm working on a customer use case. They have one of the queries as shown 
below. The query is a modified version to hide the  names of original tables 
and columns.
What we are observing is, it is picking the right partition in hdfs by the 
noFiles scanned. However, all the scanning is done by just one node. This is 
displayed in the profile of the query.   As a result the query is not truly 
distributed and takes forever to complete. This is a 20 node Drill cluster.
Are there any settings to distribute the query?
We are using following Drill version:
 apache drill> select * from sys.version;
+--------------------+------------------------------------------+------------------------------------+---------------------------+-----------------+---------------------------+
|      version       |                commit_id                 |           
commit_message           |        commit_time        | 
+--------------------+------------------------------------------+------------------------------------+---------------------------+-----------------+---------------------------+
| 1.20.3.203-eep-921 | c215ce49343cac8f5a1a072e92f005bd03903af4 | Update Drill 
version to 1.20.3.203 | 
+--------------------+------------------------------------------+------------------------------------+---------------------------+-----------------+---------------------------+


planner.width.max_per_query = 1000 
planner.width.max_per_node = 45

SELECT
       msi,
       startTime,
       endTime
FROM
        (
          SELECT column1, column2
      FROM `dfs`.dse
      UNION ALL
          SELECT       

                   FROM `dfs`.`tmp`.`p
      UNION ALL
      SELECT column2, column3
      FROM `dfs`.`tmp`.mdr
    )
where   (
                dir0 > 'event_date=2023-10-31' AND dir0 < 
'event_date=2023-12-31' 
        )
AND     (
                ntwk_date_time >= to_timestamp('2023-10-31 
14:00:00','YYYY-MM-dd HH:mm:ss')
          AND date_time <= to_timestamp('2023-12-31 13:59:59','YYYY-MM-dd 
HH:mm:ss') 
        )
AND     (
                MSI         = '0146993')
AND     trim(party_num) <> ''

order by startTime desc

Thanks,
-Ranjit

Partition scanning by one Drill bit node

Reply via email to