Hi Jon,

I just want to check in here briefly, are you still looking for support on this?
Sadly yes, this totally lacks documentation and isn’t straight forward to set 
up.

/Moritz

On 21.06.23, 23:47, "Jon Molle via user" <user@beam.apache.org> wrote:

Hi Pavel, Thanks for your response! I took a look at running Beam on Kinesis 
(analytics), as it is the AWS-recommended way to run Beam jobs. It seems like 
it doesn't work with the portable runner model. Our project is a daemon running 
in

Hi Pavel,

Thanks for your response! I took a look at running Beam on Kinesis (analytics), 
as it is the AWS-recommended way to run Beam jobs. It seems like it doesn't 
work with the portable runner model. Our project is a daemon running in a 
kubernetes cluster that has Beam code running as part of certain tasks, so I'm 
not exactly sure how that would work with Kinesis as I don't see a way to grab 
the master URL (and I'm not entirely sure if the flink image being run by 
Kinesis would work for Beam). I'd really like to avoid using any of the 
non-portable runners if possible.

That's part of why I am looking at Spark (although flink looks fairly similar): 
EKS supports autoscaling and other features dataflow does. I don't want to make 
a huge divergence between the GCP and AWS behaviour if possible. It seems 
possible, but the docs for the other runners are a bit ambiguous on exactly how 
much of submitting jobs is handled by the runner.

On Wed, Jun 21, 2023 at 12:28 PM Pavel Solomin 
<p.o.solo...@gmail.com<mailto:p.o.solo...@gmail.com>> wrote:
Hello!

> to also run on AWS

> A spark cluster on EKS seems the closest analog

There's another way of running Beam apps in AWS - 
https://aws.amazon.com/kinesis/data-analytics/<https://urldefense.com/v3/__https:/aws.amazon.com/kinesis/data-analytics/__;!!CiXD_PY!RpworRaeFcd9cQmYZ7h1p-2ZWlIMVM5czNPNWO0aKKKvvg_p2VEw9u6D8SueN0uOo58zOSnTB0hdzg$>
 - which is basically "serverless" Flink. It says Kinesis, but you can run any 
Flink / Beam job there, you don't have to use Kinesis streams. I used KDA in 
multiple projects so far, works OK. FlinkRunner also seems to have more docs as 
far as I can see.

Here's a pom.xml example: 
https://github.com/aws-samples/amazon-kinesis-data-analytics-examples/blob/master/Beam/pom.xml<https://urldefense.com/v3/__https:/github.com/aws-samples/amazon-kinesis-data-analytics-examples/blob/master/Beam/pom.xml__;!!CiXD_PY!RpworRaeFcd9cQmYZ7h1p-2ZWlIMVM5czNPNWO0aKKKvvg_p2VEw9u6D8SueN0uOo58zOSkcebCMJg$>

Best Regards,
Pavel Solomin

Tel: +351 962 950 692<tel:+351%20962%20950%20692> | Skype: pavel_solomin | 
Linkedin<https://urldefense.com/v3/__https:/www.linkedin.com/in/pavelsolomin__;!!CiXD_PY!RpworRaeFcd9cQmYZ7h1p-2ZWlIMVM5czNPNWO0aKKKvvg_p2VEw9u6D8SueN0uOo58zOSkakb4QEA$>




On Wed, 21 Jun 2023 at 16:31, Jon Molle via user 
<user@beam.apache.org<mailto:user@beam.apache.org>> wrote:
Hi,

I've been looking at the Spark Portable Runner docs, specifically Java when 
possible, and I'm a little confused about the organization. The docs seem to 
say that the JobService both submits the code to the linked spark cluster 
(described in the master url) and requires you to run a spark-submit command 
after on whatever artifacts it builds.

Unfortunately I'm not that familiar with Spark generally, so I'm probably 
misunderstanding more here, but the job server images either totally lack 
documentation or just repeat the spark runner page in the main docs.

For context, I'm trying to port some code that we're currently running on a 
Dataflow runner (on GCP) to also run on AWS. A spark cluster on EKS (either 
self-managed or potentially through EMR, but likely not based on what I am 
reading into the docs and some brief testing) seems the closest analog.

The new Tour does the same thing, in addition to only really having examples 
for python and a few more typos. I haven't found any existing questions like 
this elsewhere, so I assume that I'm just missing something that should be 
obvious.

Thanks for your time.

As a recipient of an email from the Talend Group, your personal data will be 
processed by our systems. Please see our Privacy Notice 
<https://www.talend.com/privacy-policy/> for more information about our 
collection and use of your personal information, our security practices, and 
your data protection rights, including any rights you may have to object to 
automated-decision making or profiling we use to analyze support or marketing 
related communications. To manage or discontinue promotional communications, 
use the communication preferences 
portal<https://info.talend.com/emailpreferencesen.html>. To exercise your data 
protection rights, use the privacy request 
form<https://talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl>.
 Contact us here <https://www.talend.com/contact/> or by mail to either of our 
co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400, San Mateo, 
CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150 Suresnes, France

Reply via email to