In previous versions of Spark, this would work: val events = sqlContext.jsonFile("hdfs://user/hdfs/analytics/*/PAGEVIEW/*/*")
Where the first wildcard corresponds to an application directory, the second to a partition directory, and the third matched all the files in the partition directory. The records are all the exact same format, they are just broken out by application first, then event type. This functionality was really useful. In 1.6, this same call results in the following error: Conflicting directory structures detected. Suspicious paths: (list of paths) And then it recommends reading in each root directory separately and unioning them together. It looks like the change happened here: https://github.com/apache/spark/pull/9651 1) Simply out of curiosity, since I'm still fairly new to Spark - what is the benefit of no longer allowing multiple roots? 2) Is there a better way to do what I'm trying to do? Discovering all of the paths (I won't know them ahead of time), creating tables for each of them, and then doing all of the unions seems inefficient and a lot of extra work compared to what I had before. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-and-multiple-roots-in-1-6-tp26598.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org