Ramakrishna created SPARK-41298: ----------------------------------- Summary: Getting Count on data frame is giving the performance issue Key: SPARK-41298 URL: https://issues.apache.org/jira/browse/SPARK-41298 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4 Reporter: Ramakrishna
We are invoking below query on Teradata 1) Dataframe<Row> df = spark.format("jdbc"). . . load(); 2) int count = df.count(); When we executed the df.count spark internally issuing the below query on teradata which is wasting the lot of CPU on teradata and DBAs are making noise by seeing this query. Query : SELECT 1 FROM (<ONE_MILLION_ROWS_TABLE>)SPARK_SUB_TAB Response: 1 1 1 1 1 .. 1 Is this expected behavior form spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org