Naresh created HADOOP-17984: ------------------------------- Summary: Hadoop-aws jar is unable to read file from S3 if used with third party like MINIO Key: HADOOP-17984 URL: https://issues.apache.org/jira/browse/HADOOP-17984 Project: Hadoop Common Issue Type: Bug Components: hadoop-thirdparty Affects Versions: 3.2.0 Reporter: Naresh
Unable to read a file from S3 from spark if end point url is pointing to MINIO within EKS kubernetes cluster. We are able to do read/write from other clients and minio console. But when we read using spark I see empty data frame coming. If I use dataframe.show() it displays like below. ++ || ++ ++ *Spark Config:* .config("spark.hadoop.fs.s3a.endpoint", "http://127.0.0.1:9000") // minio url or port-forward to local .config("spark.hadoop.fs.s3a.access.key",<myaccesskey>) .config("spark.hadoop.fs.s3a.secret.key",<mysecretkey>) "spark.hadoop.fs.s3a.secret.key" "spark.hadoop.fs.s3a.secret.key" .config("spark.hadoop.fs.s3a.path.style.access", *true*) .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") .config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2") .config("fs.s3a.committer.staging.conflict-mode", "replace") .config("fs.s3a.committer.name", "file") .config("fs.s3a.committer.threads", "20") .config("fs.s3a.threads.max", "20") .config("fs.s3a.fast.upload.buffer", "bytebuffer") .config("fs.s3a.fast.upload.active.blocks", "8") .config("fs.s3a.block.size", "128M") .config("mapred.input.dir.recursive","true") .config("spark.sql.parquet.binaryAsString", "true") *JAR files:* hadoop-aws:3.2.0 aws-java-sdk:1.12.30 spark-core_2.12:3.1.2 spark-sql_2.12:3.1.2 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org