Hello,

I'm attempting to run a query on Spark 3.4.0 through the Spark
ThriftServer. The cluster has 64 cores, 250GB RAM, and operates in
standalone mode using HDFS for storage.

The query is as follows:

SELECT ME.*, MB.BenefitID
FROM MemberEnrollment ME
JOIN MemberBenefits MB
ON ME.ID = MB.EnrollmentID
WHERE MB.BenefitID = 5
LIMIT 10

The tables are defined as follows:

-- Contains about 3M rows
CREATE TABLE MemberEnrollment
(
    ID INT
    , MemberID VARCHAR(50)
    , StartDate DATE
    , EndDate DATE
    -- Other columns, but these are the most important
) STORED AS ORC;

-- Contains about 25m rows
CREATE TABLE MemberBenefits
(
    EnrollmentID INT
    , BenefitID INT
) STORED AS ORC;

When I execute the query, it runs a single broadcast exchange stage, which
completes after a few seconds. Then everything just hangs. The JDBC/ODBC
tab in the UI shows the query state as COMPILED, but no stages or tasks are
executing or pending:

[image: image.png]

I've let the query run for as long as 30 minutes with no additional stages,
progress, or errors. I'm not sure where to start troubleshooting.

Thanks for your help,

Patrick

Reply via email to