Hello, While working with Spark Structured Streaming (v2.4.3) I am trying to write my streaming dataframe to a custom S3. I have made sure that I am able to login, upload data to s3 buckets manually using UI and have also setup ACCESS_KEY and SECRET_KEY for it.
val sc = spark.sparkContext sc.hadoopConfiguration.set("fs.s3a.endpoint", "s3-region1.myObjectStore.com:443") sc.hadoopConfiguration.set("fs.s3a.access.key", "00cce9eb2c589b1b1b5b") sc.hadoopConfiguration.set("fs.s3a.secrete.key", "flmheKX9Gb1tTlImO6xR++9kvnUByfRKZfI7LJT8") sc.hadoopConfiguration.set("fs.s3a.path.style.access", "true") // bucket name appended as url/bucket and not bucket.url val writeToS3Query = stream.writeStream .format("csv") .option("sep", ",") .option("header", true) .outputMode("append") .trigger(Trigger.ProcessingTime("30 seconds")) .option("path", "s3a://bucket0/") .option("checkpointLocation", "/Users/home/checkpoints/s3-checkpointing") .start() However, I am getting the error that Unable to execute HTTP request: bucket0.s3-region1.myObjectStore.com: nodename nor servname provided, or not known I have mapping of URL and IP in my /etc/hosts file and the bucket is accessable from other sources. Is there any other way to do this successfully? I am really not sure why bucket name is being appended before URL when it is executed by Spark. Can this be because I am setting up the spark context hadoop configurations after session is created and so they are not effective? But then how it is able to refer the actual URL when in the path I am providing value as s3a://bucket0. Best, Aniruddha ----------- ᐧ