Hello. In Spark 4, loading a dataframe from a path that contains a wildcard produces a warning and a stack trace that doesn't happen in Spark 3.
>>> spark.read.load('s3a://ullswater-dev/uw01/temp/test_parquet/*.parquet') 25/07/22 08:33:38 WARN org.apache.spark.sql.execution.streaming.FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3a://ullswater-dev/uw01/temp/test_parquet/*.parquet. java.io.FileNotFoundException: No such file or directory: s3a://ullswater-dev/uw01/temp/test_parquet/*.parquet at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4156) at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:4007) I think it's due to the change from this https://github.com/apache/spark/blob/v3.5.6/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala#L54 to this https://github.com/apache/spark/blob/v4.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala#L56 S3AFileSystem.isDirectory(hdfsPath) does not throw an Exception if hdfsPath contains a wildcard, whereas S3AFileSystem.getFileStatus(hdfsPath).isDirectory does. Is this a bug? Thanks.