I'm trying to use SparkSQL to efficiently query structured data from
datasets in S3. The data is naturally partitioned by date, so I've laid it
out in S3 as follows:
s3://bucket/dataset/dt=2015-07-05/
s3://bucket/dataset/dt=2015-07-04/
s3://bucket/dataset/dt=2015-07-03/
etc.
In each directory, da
Hi,
I'm just getting started with Spark so apologies if this I'm missing
something obvious. In the below, I'm using Spark 1.4.
I've created a partitioned table in S3 (call it 'dataset'), with basic
structure like so:
s3://bucket/dataset/pk=a
s3://bucket/dataset/pk=b
s3://bucket/dataset/pk=c
In