subject:"How to optimize the performance of Beam on Spark"

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-28 Thread Tim Robertson

Thanks for sharing those results. The second set (executors at 20-30) look similar to what I would have expected. BEAM-5036 definitely plays a part here as the data is not moved on HDFS efficiently (fix in PR awaiting review now [1]). To give an idea of the impact, here are some numbers from my o

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-28 Thread Robert Bradshaw

Something here on the Beam side is clearly linear in the input size, as if there's a bottleneck where were' not able to get any parallelization. Is the spark variant running in parallel? On Fri, Sep 28, 2018 at 4:57 AM devinduan(段丁瑞) wrote: > Hi > I have completed my test. > 1. Spark paramet

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-19 Thread 段丁瑞

am On Spark in the future and then feed back the results. Regards devin From: Jean-Baptiste Onofré<mailto:j...@nanthrax.net> Date: 2018-09-19 16:32 To: devinduan(段丁瑞)<mailto:devind...@tencent.com>; dev<mailto:dev@beam.apache.org> Subject: Re: How to optimize the performance of

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-19 Thread Tim Robertson

vin > > > *From:* Jean-Baptiste Onofré > *Date:* 2018-09-19 16:32 > *To:* devinduan(段丁瑞) ; dev > *Subject:* Re: How to optimize the performance of Beam on Spark(Internet > mail) > > Thanks for the details. > > I will take a look later tomorrow (I have another issue to

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-19 Thread 段丁瑞

rg> Subject: Re: How to optimize the performance of Beam on Spark(Internet mail) Thanks for the details. I will take a look later tomorrow (I have another issue to investigate on the Spark runner today for Beam 2.7.0 release). Regards JB On 19/09/2018 08:31, devinduan(段丁瑞) wrote: > Hi, &g

Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-19 Thread Jean-Baptiste Onofré

ste Onofré <mailto:j...@nanthrax.net> > *Date:* 2018-09-19 12:22 > *To:* dev@beam.apache.org <mailto:dev@beam.apache.org> > *Subject:* Re: How to optimize the performance of Beam on > Spark(Internet mail) > > Hi, > > did you compare the stag

Re: How to optimize the performance of Beam on Spark(Internet mail)

2018-09-18 Thread Jean-Baptiste Onofré

quot;: > > Spark "WordCount": > > I will try the other example later. > > Regards > devin > > > *From:* Jean-Baptiste Onofré <mailto:j...@nanthrax.net> > *Date:* 2018-09-18 22:43 > *To:* dev@beam.apache.org <mailto:dev@b

Re: How to optimize the performance of Beam on Spark

2018-09-18 Thread Jean-Baptiste Onofré

Hi, The first huge difference is the fact that the spark runner still uses RDD whereas directly using spark, you are using dataset. A bunch of optimization in spark are related to dataset. I started a large refactoring of the spark runner to leverage Spark 2.x (and dataset). It's not yet ready as

Re: How to optimize the performance of Beam on Spark

2018-09-18 Thread Tim Robertson

Hi devinduan The known issues Robert links there are actually HDFS related and not specific to Spark. The improvement we're seeking is that the final copy of the output file can be optimised by using a "move" instead of "copy" andI expect to have it fixed for Beam 2.8.0. On a small dataset like

Re: How to optimize the performance of Beam on Spark

2018-09-18 Thread Robert Bradshaw

There are known performance issues with Beam on Spark that are being worked on, e.g. https://issues.apache.org/jira/browse/BEAM-5036 . It's possible you're hitting something different, but would be worth investigating. See also https://lists.apache.org/list.html?dev@beam.apache.org:lte=1M:Performan

How to optimize the performance of Beam on Spark

2018-09-17 Thread 段丁瑞

Hi， I'm testing Beam on Spark. I use spark example code WordCount processing 1G data file, cost 1 minutes. However, I use Beam example code WordCount processing the same file, cost 30minutes. My Spark parameter is : --deploy-mode client --executor-memory 1g --num-executors 1 --d

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

Re: Re: How to optimize the performance of Beam on Spark(Internet mail)

Re: How to optimize the performance of Beam on Spark(Internet mail)

Re: How to optimize the performance of Beam on Spark(Internet mail)

Re: How to optimize the performance of Beam on Spark

Re: How to optimize the performance of Beam on Spark

Re: How to optimize the performance of Beam on Spark

How to optimize the performance of Beam on Spark

11 matches

Site Navigation

Mail list logo

Footer information