subject:"How to make bucket listing faster while using S3 with wholeTextFile"

Re: How to make bucket listing faster while using S3 with wholeTextFile

2021-03-16 Thread Ben Kaylor

rouped(100).toList.par.map(groupedParts => > spark.read.parquet(groupedParts: _*)) > > val finalDF = dfs.seq.grouped(100).toList.par.map(dfgroup => > dfgroup.reduce(_ union _)).reduce(_ union _).coalesce(2000) > > > > *From: *Ben Kaylor > *Date: *Tuesday, March 16, 2021 at 3:2

Re: How to make bucket listing faster while using S3 with wholeTextFile

2021-03-16 Thread brandonge...@gmail.com

oDB/your DB of choice. From: Boris Litvak Sent: Tuesday, 16 March 2021 9:03To: Ben Kaylor <kaylor...@gmail.com>; Alchemist <alchemistsrivast...@gmail.com>Cc: User <user@spark.apache.org>Subject: RE: How to make bucket listing faster while using S3 with wholeTextFile Ben, I’d explore thes

Re: How to make bucket listing faster while using S3 with wholeTextFile

2021-03-16 Thread Ben Kaylor

Alchemist < > alchemistsrivast...@gmail.com> > *Cc:* User > *Subject:* RE: How to make bucket listing faster while using S3 with > wholeTextFile > > > > Ben, I’d explore these approaches: > >1. To address your problem, I’d setup an inventory for the S3 bucket: &g

RE: How to make bucket listing faster while using S3 with wholeTextFile

2021-03-16 Thread Boris Litvak

goes. Boris From: Ben Kaylor mailto:kaylor...@gmail.com>> Sent: Monday, 15 March 2021 21:10 To: Alchemist mailto:alchemistsrivast...@gmail.com>> Cc: User mailto:user@spark.apache.org>> Subject: Re: How to make bucket listing faster while using S3 with wholeTextFile Not sure on

RE: How to make bucket listing faster while using S3 with wholeTextFile

2021-03-16 Thread Boris Litvak

15 March 2021 21:10 To: Alchemist Cc: User Subject: Re: How to make bucket listing faster while using S3 with wholeTextFile Not sure on answer on this, but am solving similar issues. So looking for additional feedback on how to do this. My thoughts if unable to do via spark and S3 boto commands, t

Re: How to make bucket listing faster while using S3 with wholeTextFile

2021-03-15 Thread Stephen Coy

Hi there, At risk of stating the obvious, the first step is to ensure that your Spark application and S3 bucket are colocated in the same AWS region. Steve C On 16 Mar 2021, at 3:31 am, Alchemist mailto:alchemistsrivast...@gmail.com>> wrote: How to optimize s3 list S3 file using wholeTextFile

Re: How to make bucket listing faster while using S3 with wholeTextFile

2021-03-15 Thread Ben Kaylor

Not sure on answer on this, but am solving similar issues. So looking for additional feedback on how to do this. My thoughts if unable to do via spark and S3 boto commands, then have apps self report those changes. Where instead of having just mappers discovering the keys, you have services self

How to make bucket listing faster while using S3 with wholeTextFile

2021-03-15 Thread Alchemist

How to optimize s3 list S3 file using wholeTextFile(): We are using wholeTextFile to read data from S3. As per my understanding wholeTextFile first list files of given path. Since we are using S3 as input source, then listing files in a bucket is single-threaded, the S3 API for listing the key

Re: How to make bucket listing faster while using S3 with wholeTextFile

Re: How to make bucket listing faster while using S3 with wholeTextFile

Re: How to make bucket listing faster while using S3 with wholeTextFile

RE: How to make bucket listing faster while using S3 with wholeTextFile

RE: How to make bucket listing faster while using S3 with wholeTextFile

Re: How to make bucket listing faster while using S3 with wholeTextFile

Re: How to make bucket listing faster while using S3 with wholeTextFile

How to make bucket listing faster while using S3 with wholeTextFile

8 matches

Site Navigation

Mail list logo

Footer information