[ https://issues.apache.org/jira/browse/SPARK-20061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved SPARK-20061. ------------------------------------ Resolution: Duplicate > Reading a file with colon (:) from S3 fails with URISyntaxException > ------------------------------------------------------------------- > > Key: SPARK-20061 > URL: https://issues.apache.org/jira/browse/SPARK-20061 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 2.1.0 > Environment: EC2, AWS > Reporter: Michel Lemay > > When reading a bunch of files from s3 using wildcards, it fails with the > following exception: > {code} > scala> val fn = "s3a://mybucket/path/*/" > scala> val ds = spark.readStream.schema(schema).json(fn) > java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative > path in absolute URI: > 2017-01-06T20:33:45.255-analyticsqa-49569270507599054034141623773442922465540524816321216514.json > at org.apache.hadoop.fs.Path.initialize(Path.java:205) > at org.apache.hadoop.fs.Path.<init>(Path.java:171) > at org.apache.hadoop.fs.Path.<init>(Path.java:93) > at org.apache.hadoop.fs.Globber.glob(Globber.java:241) > at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657) > at > org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:237) > at > org.apache.spark.deploy.SparkHadoopUtil.globPathIfNecessary(SparkHadoopUtil.scala:243) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$2.apply(DataSource.scala:131) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$2.apply(DataSource.scala:127) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) > at scala.collection.immutable.List.flatMap(List.scala:344) > at > org.apache.spark.sql.execution.datasources.DataSource.tempFileIndex$lzycompute$1(DataSource.scala:127) > at > org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$tempFileIndex$1(DataSource.scala:124) > at > org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:138) > at > org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:229) > at > org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87) > at > org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87) > at > org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30) > at > org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124) > at > org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:133) > at > org.apache.spark.sql.streaming.DataStreamReader.json(DataStreamReader.scala:181) > ... 50 elided > Caused by: java.net.URISyntaxException: Relative path in absolute URI: > 2017-01-06T20:33:45.255-analyticsqa-49569270507599054034141623773442922465540524816321216514.json > at java.net.URI.checkPath(URI.java:1823) > at java.net.URI.<init>(URI.java:745) > at org.apache.hadoop.fs.Path.initialize(Path.java:202) > ... 73 more > {code} > The file in question sits at the root of s3a://mybucket/path/ > {code} > aws s3 ls s3://mybucket/path/ > PRE subfolder1/ > PRE subfolder2/ > ... > 2017-01-06 20:33:46 1383 > 2017-01-06T20:33:45.255-analyticsqa-49569270507599054034141623773442922465540524816321216514.json > ... > {code} > Removing the wildcard from path make it work but it obviously does misses all > files in subdirectories. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org