[jira] [Created] (SPARK-41298) Getting Count on data frame is giving the performance issue

Ramakrishna (Jira) Mon, 28 Nov 2022 04:39:06 -0800

Ramakrishna created SPARK-41298:
-----------------------------------

             Summary: Getting Count on data frame is giving the performance 
issue
                 Key: SPARK-41298
                 URL: https://issues.apache.org/jira/browse/SPARK-41298
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.4
            Reporter: Ramakrishna



We are invoking  below query on Teradata 

1) Dataframe<Row> df = spark.format("jdbc"). . . load();

2) int count = df.count();

When we executed the df.count spark internally issuing the below query on 
teradata which is wasting the lot of CPU on teradata and DBAs are making noise 
by seeing this query.

 

Query : SELECT 1 FROM (<ONE_MILLION_ROWS_TABLE>)SPARK_SUB_TAB

Response:

1

1

1

1

1

..

1

 

Is this expected behavior form spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-41298) Getting Count on data frame is giving the performance issue

Reply via email to