Hello, I am trying to run a spark job that is trying to write the data into a custom s3 endpoint bucket. But I am stuck at this line of output and job is not moving forward at all -
20/04/29 16:03:59 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/Users/abc/IdeaProjects/qct-air-detection/spark-warehouse/'). 20/04/29 16:03:59 INFO SharedState: Warehouse path is 'file:/Users/abc/IdeaProjects/qct-air-detection/spark-warehouse/'. 20/04/29 16:04:01 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties 20/04/29 16:04:02 INFO MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 20/04/29 16:04:02 INFO MetricsSystemImpl: s3a-file-system metrics system started After long time of waiting it shows this - org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on test-bucket: com.amazonaws.SdkClientException: Unable to execute HTTP request: Connect to s3-region0.mycloud.com:443 [s3-region0.mycloud.com/10.10.3.72] failed: Connection refused (Connection refused): Unable to execute HTTP request: Connect to s3-region0.mycloud.com:443 [s3-region0.mycloud.com/10.10.3.72] failed: Connection refused (Connection refused) However, I am able to access this bucket from aws cli from the same machine. I don't understand why it is saying not able to execute the HTTP request. I am using - spark 3.0.0-preview2 hadoop-aws 3.2.0 aws-java-sdk-bundle 1.11.375 My spark code has following properties set for hadoop configuration - spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", ENDPOINT); spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY); spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY); spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access", "true") Can someone help me in understanding what is wrong here? Is there anything else I need to configure. The custom s3-endpoint and its keys are valid and working from aws cli profile. What is wrong with the scala code here? val dataStreamWriter: DataStreamWriter[Row] = PM25quality.select(dayofmonth(current_date()) as "day", month(current_date()) as "month", year(current_date()) as "year") .writeStream .format("parquet") .option("checkpointLocation", "/Users/abc/Desktop/qct-checkpoint/") .outputMode("append") .trigger(Trigger.ProcessingTime("15 seconds")) .partitionBy("year", "month", "day") .option("path", "s3a://test-bucket") val streamingQuery: StreamingQuery = dataStreamWriter.start() Aniruddha ----------- ᐧ