Re: [SQL] Reading from hive table is listing all files in S3

Mich Talebzadeh Wed, 03 Aug 2016 07:48:04 -0700

Hi,

Do you have a schema definition for this Hive table?


What format is this table stored

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 August 2016 at 15:03, Mehdi Meziane <mehdi.mezi...@ldmobile.net> wrote:

> Hi all,
>
> We have a hive table stored in S3 and registered in a hive metastore.
> This table is partitionned with a key "day".
>
> So we access this table through the spark dataframe API as :
>
> sqlContext.read()
> .table("tablename)
> .where(col("day").between("2016-08-01","2016-08-02"))
>
> When the job is launched, we can see that spark have tasks "table" which
> have a small duration (seconds) but takes minutes.
> In the logs we see that every paths for every partitions are listed,
> regardless the partition key values, during minutes.
>
> 16/08/03 13:17:16 INFO HadoopFsRelation: Listing
> s3a://buckets3/day=2016-07-24
> 16/08/03 13:17:16 INFO HadoopFsRelation: Listing
> s3a://buckets3/day=2016-07-25
> ....
>
> Is it a normal behaviour? Do we could specify something in the
> read().table, maybe some options?
> I tried to find such options but i cannot find anything.
>
> Thanks,
> Mehdi
>

Re: [SQL] Reading from hive table is listing all files in S3

Reply via email to