In previous versions of Spark, this would work:

val events =
sqlContext.jsonFile("hdfs://user/hdfs/analytics/*/PAGEVIEW/*/*")

Where the first wildcard corresponds to an application directory, the second
to a partition directory, and the third matched all the files in the
partition directory. The records are all the exact same format, they are
just broken out by application first, then event type. This functionality
was really useful.

In 1.6, this same call results in the following error:

Conflicting directory structures detected. Suspicious paths:
(list of paths)

And then it recommends reading in each root directory separately and
unioning them together. It looks like the change happened here:

https://github.com/apache/spark/pull/9651

1) Simply out of curiosity, since I'm still fairly new to Spark - what is
the benefit of no longer allowing multiple roots?

2) Is there a better way to do what I'm trying to do? Discovering all of the
paths (I won't know them ahead of time), creating tables for each of them,
and then doing all of the unions seems inefficient and a lot of extra work
compared to what I had before.

Thanks.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-and-multiple-roots-in-1-6-tp26598.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to