This is an automated email from the ASF dual-hosted git repository. sarutak pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.2 by this push: new fb38887 [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log fb38887 is described below commit fb38887e001d33adef519d0288bd0844dcfe2bd5 Author: Kousuke Saruta <saru...@oss.nttdata.com> AuthorDate: Wed Aug 25 21:30:43 2021 +0900 [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log ### What changes were proposed in this pull request? This PR fixes an issue that there is no way to redact sensitive information in Spark Thrift Server log. For example, JDBC password can be exposed in the log. ``` 21/08/25 18:52:37 INFO SparkExecuteStatementOperation: Submitting query 'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde")' with ca14ae38-1aaf-4bf4-a099-06b8e5337613 ``` ### Why are the changes needed? Bug fix. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')` Then, confirmed the log. ``` 21/08/25 18:54:11 INFO SparkExecuteStatementOperation: Submitting query 'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password=*********(redacted))' with ffc627e2-b1a8-4d83-ab6d-d819b3ccd909 ``` Closes #33832 from sarutak/fix-SPARK-36398. Authored-by: Kousuke Saruta <saru...@oss.nttdata.com> Signed-off-by: Kousuke Saruta <saru...@oss.nttdata.com> (cherry picked from commit b2ff01608f5ecdba19630e12478bd370f9766f7b) Signed-off-by: Kousuke Saruta <saru...@oss.nttdata.com> --- .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala index 0df5885..4f40889 100644 --- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala +++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala @@ -185,8 +185,8 @@ private[hive] class SparkExecuteStatementOperation( override def runInternal(): Unit = { setState(OperationState.PENDING) - logInfo(s"Submitting query '$statement' with $statementId") val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement) + logInfo(s"Submitting query '$redactedStatement' with $statementId") HiveThriftServer2.eventManager.onStatementStart( statementId, parentSession.getSessionHandle.getSessionId.toString, --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org