Hi all,
I'm working on a customer use case. They have one of the queries as shown
below. The query is a modified version to hide the names of original tables
and columns.
What we are observing is, it is picking the right partition in hdfs by the
noFiles scanned. However, all the scanning is done by just one node. This is
displayed in the profile of the query. As a result the query is not truly
distributed and takes forever to complete. This is a 20 node Drill cluster.
Are there any settings to distribute the query?
We are using following Drill version:
apache drill> select * from sys.version;
+--------------------+------------------------------------------+------------------------------------+---------------------------+-----------------+---------------------------+
| version | commit_id |
commit_message | commit_time |
+--------------------+------------------------------------------+------------------------------------+---------------------------+-----------------+---------------------------+
| 1.20.3.203-eep-921 | c215ce49343cac8f5a1a072e92f005bd03903af4 | Update Drill
version to 1.20.3.203 |
+--------------------+------------------------------------------+------------------------------------+---------------------------+-----------------+---------------------------+
planner.width.max_per_query = 1000
planner.width.max_per_node = 45
SELECT
msi,
startTime,
endTime
FROM
(
SELECT column1, column2
FROM `dfs`.dse
UNION ALL
SELECT
FROM `dfs`.`tmp`.`p
UNION ALL
SELECT column2, column3
FROM `dfs`.`tmp`.mdr
)
where (
dir0 > 'event_date=2023-10-31' AND dir0 <
'event_date=2023-12-31'
)
AND (
ntwk_date_time >= to_timestamp('2023-10-31
14:00:00','YYYY-MM-dd HH:mm:ss')
AND date_time <= to_timestamp('2023-12-31 13:59:59','YYYY-MM-dd
HH:mm:ss')
)
AND (
MSI = '0146993')
AND trim(party_num) <> ''
order by startTime desc
Thanks,
-Ranjit