://github.com/yuwtennis/apache-beam-pipeline-apps/blob/main/java/Dockerfile
Thanks,
Yu Watanabe
On Fri, May 19, 2023 at 12:28 AM Pablo Estrada via user
wrote:
>
> Hi Utkarsh,
> you can pass a path in GCS (or a filesystem), and the workers should be able
> to download it onto themselves
Hello Danny.
Ah . I see . Thank you for your advice.
Thanks,
Yu Watanabe
On Mon, Aug 29, 2022 at 9:26 AM Danny McCormick via user
wrote:
>
> Hey Yu, as the error you posted suggests, the Go direct runner which you're
> using in your local development environment doesn'
.github.com/yuwtennis/dec3bf3bfc0c4fa54d9d3565c98d008e
Thanks,
Yu
--
Yu Watanabe
linkedin: www.linkedin.com/in/yuwatanabe1/
twitter: twitter.com/yuwtennis
:
>
>
>
> On Wed, Aug 17, 2022 at 3:05 AM Yu Watanabe wrote:
>>
>> Hello.
>>
>> I am trying to write code for cross language transform for
>> ElasticsearchIO but having trouble with it.
>> I would appreciate it if I could get help.
>>
>> A
egards,
Yu Watanabe
--
Yu Watanabe
linkedin: www.linkedin.com/in/yuwatanabe1/
twitter: twitter.com/yuwtennis
Hello Danny.
Thank you for the details. I appreciate your message.
I am a newbie around building io . So I will look into the links and first
build my knowledge.
Thanks,
Yu Watanabe
On Fri, Jul 8, 2022 at 8:28 PM Danny McCormick via user <
user@beam.apache.org> wrote:
> Hey Yu
Hello .
Is there any guideline for building a go sdk connector ?
I was reviewing the document but I could not find one for golang.
https://beam.apache.org/documentation/io/developing-io-overview/
I was thinking of building one for elasticsearch.
Thanks,
Yu Watanabe
--
Yu Watanabe
linkedin
olidays for a little while.
>
> [1]
> https://beam.apache.org/releases/javadoc/2.34.0/org/apache/beam/sdk/io/azure/blobstore/package-summary.html
>
> On Tue, Dec 21, 2021 at 4:47 AM Yu Watanabe wrote:
>
>> Hello .
>>
>> Looks like unit tests are passed in these PRs
/documentation/io/built-in/
On Tue, Dec 21, 2021 at 10:50 AM 池上有希乃 wrote:
> Hi
>
> I think I will use Apache Beam for my business.
> So there is a question.
> Whether FileIO for Azure Blob Storage is GA phase or not?
>
> Thanks,
> Yukino Ikegami
>
--
Yu Watanabe
li
he code, if you would like to contribute. help me *star if if it
> is useful for you. thank you
>
> On Monday, August 16, 2021, 12:37:46 AM GMT+8, Yu Watanabe <
> yu.w.ten...@gmail.com> wrote:
>
>
> Hello .
>
> I would like to ask question for spark runner.
>
&g
it. [2]
>
> [1]
> https://github.com/apache/beam/pull/14897/commits/b6fca2bb79d9e7a69044b477460445456720ec58
> [2] https://issues.apache.org/jira/browse/BEAM-12762
>
>
> On Sun, Aug 15, 2021 at 9:37 AM Yu Watanabe wrote:
>
>> Hello .
>>
>> I would like to ask
/spark-2.4.8-bin-hadoop2.7.tgz
Would there be any setting which you need to be aware of for spark 3.1.2 ?
Thanks,
Yu Watanabe
--
Yu Watanabe
linkedin: www.linkedin.com/in/yuwatanabe1/
twitter: twitter.com/yuwtennis
> *SSH tunnel to the master node:*
> gcloud compute ssh \
> --project \
> --zone \
> -- -NL 7077:localhost:7077
>
> -
>
> Thanks,
> Mahan
>
> On Tue, Aug 10, 2021 at 3:53 PM Yu Watanabe wrote:
>
>> Hello .
>>
>> Would
;s the environment_type? Can we use DOCKER? Then what's the SDK
> Harness Configuration?
> 4- Should we run the job-server outside of the Dataproc cluster or should
> we run it in the master node?
>
> Thanks,
> Mahan
>
--
Yu Watanabe
linkedin: www.linkedin.com/in/yuwatanabe1/
twitter: twitter.com/yuwtennis
esolving
with 127.0.0.1 and see if it works ?
Thanks,
Yu Watanabe
On Mon, Feb 10, 2020 at 3:29 AM Xander Song wrote:
>
> Hi Jincheng,
>
> Thanks for your help. Yes, I am using Mac. Your suggestion allowed me to
> submit the job on port 8099. However, I am now encountering
hu, Jan 9, 2020 at 7:17 AM Chamikara Jayalath wrote:
>
> Hmm, seems like a Java (external) ParDo is being forwarded to Python SDK for
> execution somehow. +Maximilian Michels might know more.
>
> On Sun, Jan 5, 2020 at 2:57 AM Yu Watanabe wrote:
>>
>> Hello.
>&
-
I have my platform built upon docker engine and used below combination
of modules.
1. apache-beam: 2.16.0
2. flink: 1.8
3. python-sdk(37): compiled using release-2.16.0
4. jobserver: compiled using release-2.16.0
5. kafka:
==
$ gsutil cat gs://${PROJECT_ID}/sample.txt-0-of-1
Hello World.
Apache beam
====
Thanks,
Yu Watanabe
On Fri, Jan 3, 2020 at 5:58 AM Kyle Weaver wrote:
> This is the root cause:
>
> > python-sdk_
Matthew
> Just to verify that the preferred python version is python3?
Harness container supports both python 2 and 3.
https://beam.apache.org/documentation/runtime/environments/
In my opinion, considering that python2's EOL is Jan 1, 2020 , python 3
would be the choice.
Thanks,
Yu
oyment/blob/master/flink-session-cluster/docker/docker-compose.yml
My pipeline code
https://github.com/yuwtennis/beam-deployment/blob/master/flink-session-cluster/docker/samples/src/sample.py
Would there be any settings I need to use for starting up sdk container ?
Best Regards,
Yu Watanabe
--
Yu Watanabe
yu.w.ten...@gmail.com
Kyle.
Thank you for the reply.
Error had disappeared after creating sdk and job-server with correct apache
beam branch (2.16.0 in this case).
Thanks,
Yu Watanabe
On Tue, Dec 31, 2019 at 2:30 AM Kyle Weaver wrote:
> This error can happen when the job server and sdk versions are mismatc
st", "${JOB_HOST}", "--job-port",
"${JOB_PORT}" ]
==========
I appreciate if I could get some help.
Thanks,
Yu Watanabe
--
Yu Watanabe
yu.w.ten...@gmail.com
7, 2019 at 4:49 PM Yu Watanabe wrote:
> Thank you for the comment.
>
> I finally got this working. I would like to share my experience for people
> whom are beginner with portable runner.
> What I done was below items when calling functions and classes from
> external pa
0',
'elasticsearch>=7.0.0,<8.0.0',
'urllib3',
'boto3'
]
setuptools.setup(
author = 'Yu Watanabe',
author_email = 'AUTHOR_
eature
> complete, but we're still smoothing out some of the ease-of-use issues. And
> feedback like this really helps, so thanks!
>
> On Tue, Sep 24, 2019 at 8:10 PM Yu Watanabe wrote:
>
>> I needed little adjustment with the pipeline option to work it out.
>>
>> P
;--environment_type=DOCKER",
"--experiments=beam_fn_api",
"--job_endpoint=localhost:8099"
])
options.view_as(SetupOptions).save_main_session = True
options.view_as(SetupOptions).setup_file =
'/home/admin/quality-validation/bin/setup.py'
=
Is it possible to solve dependency in ParDo linked to external module when
using Apache Flink?
Thanks,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.ten...@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image:
Twitter icon] <https://twitter.com/yuwtennis>
Actually there was a good example in the latest wordcount.py in master repo.
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/wordcount.py
On Thu, Sep 26, 2019 at 12:00 PM Yu Watanabe wrote:
> Thank you for the help.
>
> I have chosen to remove the super()
reference to super in your dofn should help.
>
> On Wed, Sep 25, 2019 at 5:13 PM Yu Watanabe wrote:
>
>> Thank you for the reply.
>>
>> " save_main_session" did not work, however, situation had changed.
>>
>>
Pre Frame Line
postFrm[f], # Post Frame Line
info['pre'][0][1],# Pre Last Modified
Time
info['post'][0][1]) # Post Last
Modified Time
yield (fra
=======
This is not a problem when running pipeline using DirectRunner.
May I ask , how should I import class for ParDo when running on Flink ?
Thanks,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.ten...@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image:
Twitter icon] <https://twitter.com/yuwtennis>
s-dir /home/ec2-user --job-port 8099
>
> 3. Submit your pipeline
> options = PipelineOptions([
> "--runner=PortableRunner",
> "--environment_config=
> asia.gcr.io/creationline001/beam/python3:latest&qu
"--experiments=beam_fn_api"
> ])
>
> On Mon, Sep 23, 2019 at 8:20 PM Yu Watanabe wrote:
>
>> Kyle.
>>
>> Thank you for the assistance.
>>
>> Looks like "--artifacts-dir" is not parsed. Below is the DEBUG
27;-jar',
'/home/admin/.apache_beam/cache/beam-runners-flink-1.8-job-server-2.15.0.jar',
'--flink-master-url',
'ip-172-31-12-113.ap-northeast-1.compute.internal:8081', '--artifacts-dir',
'/home/admin/artifacts18unmki3', '--job-port', 46767, '--artifact-port', 0,
act directory from worker
node?
I appreciate if I could get some advice .
Best Regards,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.ten...@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image:
Twitter icon] &
t;docker pull" if the
container is already pulled ?
Thanks,
Yu Watanabe
On Fri, Sep 20, 2019 at 7:48 PM Benjamin Tan
wrote:
> Seems like some file is missing. Why not manually docker pull the image
> (remember to adjust the container location to omit to registry) locally
> f
un(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
...
======
Thanks,
Yu Watanabe
On Thu, Sep 19, 2019 at 3:47 AM Ankur Goenka wrote:
> Adding to the previous suggestions.
> You can also add "--retain_docker_container" to your pipeline option an
==--
java.util.concurrent.ExecutionException: java.io.FileNotFoundException:
/tmp/artifactskjsmgv8p/job_ea4673de-71d0-4dd9-b683-fe8c52464666/MANIFEST
(No such file or directory)
===--
Thanks,
Y
16.0 is ongoing and it will be released
> once blockers are solved.
>
>
> -Rui
>
> On Wed, Sep 18, 2019 at 9:34 PM Yu Watanabe wrote:
>
>> Hello.
>>
>> I would like to use 2.16.0 to diagnose container problem, however, looks
>> like the job-server is not
-flink-1.8-job-server-2.16.0.jar:
HTTP Error 404: Not Found
Checked maven repo and indeed there is no job-server 2.16.0 yet.
https://mvnrepository.com/artifact/org.apache.beam/beam-runners-flink
Will 2.16.0 released soon ?
Thanks,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to
ot;:\"/opt/apache/beam/boot\"}
>
> On 2019/09/18 10:40:42, Yu Watanabe wrote:
> > Hello.
> >
> > I am trying to run FlinkRunner (2.15.0) on AWS EC2 instance and submit
> job
> > to
nner.py#L138
config = json.loads(portable_options.environment_config)
May I ask what command I need to set as shell script for each task managers
?
Best Regards,
Yu Watanabe
On Wed, Sep 18, 2019 at 9:39 PM Benjamin Tan
wrote:
> Seems like docker is not installed. Maybe run with PROC
--flink-master-url ip-172-31-1-84.ap-northeast-1.compute.internal:43581
--artifacts-dir /tmp/artifactskj47j8yn --job-port 48205 --artifact-port 0
--expansion-port 0
admin@ip-172-31-9-89:/opt/flink$
====-
Would there be any other setting I need to
])
p = beam.Pipeline(options=options)
=======
Thanks,
Yu Watanabe
On Tue, Sep 17, 2019 at 11:21 AM Benjamin Tan
wrote:
> Something like this:
>
> options = PipelineOptions(["--runner=PortableRunner&quo
=======
Perhaps is there any environment variable to specify which image to use ?
Best Regards,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.ten...@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image:
Twitter icon] <https://twitter.com/yuwtennis>
nstead of key 'rest.address'
[flink-runner-job-invoker] INFO org.apache.flink.runtime.rest.RestClient -
Rest client endpoint started.
[flink-runner-job-invoker] INFO
org.apache.flink.client.program.rest.RestClusterClient - Submitting job
4e055a8878dda3f564a7b7c84d48510d (detached: false).
Thanks,
Yu Watan
with beam.Pipeline(options=options) as p:
(p | beam.Create(["Hello World"]))
===
Would there be any other settings should I look for ?
Thanks,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to c
DFS/S3/GCS/...
> * Mounting an external directory into the container so that any "local"
> writes appear outside the container
> * Using a non-docker environment such as external or process.
>
> On Thu, Sep 12, 2019 at 4:51 AM Yu Watanabe wrote:
>
>> Hello.
>>
>>
localhost:8099 (this is the default
address of the JobService). For example:
========
Thanks,
Yu Watanabe
On Fri, Sep 13, 2019 at 5:09 AM Kyle Weaver wrote:
> +dev I think we should probably point new users of
> the porta
Lukasaz
Thank you for the reply.
I will try apache flink.
Thanks,
Yu
On Sun, Sep 8, 2019 at 11:59 PM Lukasz Cwik wrote:
> Try using Apache Flink.
>
> On Sun, Sep 8, 2019 at 6:23 AM Yu Watanabe wrote:
>
>> Hello .
>>
>> I would like to ask question related to
l_endpoint=localhost:36079'.
stderr: Unable to find image '
ywatanabe-docker-apache.bintray.io/beam/python3:latest' locallydocker:
Error response from daemon: unknown: Subject ywatanabe was not found.See
'docker run --help'.
-----------
A
://beam.apache.org/documentation/runners/capability-matrix/#cap-summary-when
Is there anyway in portable runner that you can do similar processing
as *timely
processing* ?
This is my first time using portable runner and I appreciate if I can get
help with this.
Best Regards,
Yu Watanabe
--
Yu
ich memory
profiler do users of apache beam use for python 3.7.3 ?
https://pypi.org/project/memory-profiler/
Thanks,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.ten...@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatan
t; For a raw dump you could do something like:
>>
>> p = beam.Pipeline(...)
>> p | beam.Read...
>> results = p.run()
>> results.wait_until_finish()
>> import pprint
>> pprint.pprint(results._metrics_by_stage)
>>
>>
>>
>>
>
speed for
each transform.
Best Regards,
Yu Watanabe
--
Yu Watanabe
Weekend Freelancer who loves to challenge building data platform
yu.w.ten...@gmail.com
[image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image:
Twitter icon] <https://twitter.com/yuwtennis>
54 matches
Mail list logo