Re: Question about triggering

2020-01-09 Thread Kenneth Knowles
Does it have the same behavior in the direct runner? What are the sizes of intermediate PCollections? Kenn On Wed, Jan 8, 2020 at 1:05 PM Andrés Garagiola wrote: > Hi all, > > I'm doing some tests with beam and apache flink. I'm running the code > below: > > public static void main(String[]

Re: Scio 0.8.0 released

2020-01-09 Thread Kenneth Knowles
So fast! Excellent. On Wed, Jan 8, 2020 at 11:28 AM Robert Bradshaw wrote: > Nice! > > On Wed, Jan 8, 2020 at 10:03 AM Neville Li wrote: > >> Hi all, >> >> We just released Scio 0.8.0. This is based on the most recent Beam 2.17.0 >> release and includes a lot of new features & bug fixes over

Re: [FYI] Rephrasing the 'lull'/processing stuck logs

2020-01-09 Thread Steve Niemitz
One other nice enhancement around this would be if a transform could indicate that it was executing a "slow" operation. A good example is writing in BigQueryIO, it's very reasonable/normal for a load job to run for more than 5 minutes, and the "stuck" message can be confusing to users. The

[FYI] Rephrasing the 'lull'/processing stuck logs

2020-01-09 Thread Pablo Estrada
Hello Beam users and community, The Beam Python SDK, and Java workers have a utility where they will print a log message whenever there's an execution thread where no state transitions happen for over five minutes. These messages are common in two scenarios: 1. A deadlock happening in the worker

RE: runShadow: prebuild and build in read-only directory

2020-01-09 Thread Robert Lugg
Confirmed. That worked perfectly. Thank you.

Re: runShadow: prebuild and build in read-only directory

2020-01-09 Thread Kyle Weaver
You can build the job server jar using: ./gradlew runners:flink:1.8:job-server:shadowJar The output jar will be located in: runners/flink/1.8/job-server/build/libs/ You can run the jar using `java -jar`. Hope that helps. On Thu, Jan 9, 2020 at 10:47 AM Robert Lugg wrote: > I am able to run

Re: dataflow and ImageMagick

2020-01-09 Thread André Rocha Silva
Luke, it worked! I changed the CUSTOM_COMMANDS on the juliaset example to: "CUSTOM_COMMANDS = [ ['apt-get', 'update'], ['apt-get', 'install', 'ghostscript', '-y']]" Thank you very much!! On Thu, Jan 9, 2020 at 3:03 PM Luke Cwik wrote: > Andre, add the required installation commands

runShadow: prebuild and build in read-only directory

2020-01-09 Thread Robert Lugg
I am able to run Beam through the python sdk using runShadow for the Flink runner. I manually start runShadow using: ./gradlew -g ~/.gradle :runners:flink:1.8:job-server:runShadow The beam directory, by necessity is in a read-only file system. The only way I can get this to work is: 1.

Re: dataflow and ImageMagick

2020-01-09 Thread Luke Cwik
Andre, add the required installation commands (e.g. the apt-get install commands) for the non-Python dependencies to the list of CUSTOM_COMMANDS in your setup.py file. See the Juliaset setup.py [1] for an example. Note: You must make sure that these commands are runnable on the remote worker (e.g.

Re: dataflow and ImageMagick

2020-01-09 Thread Leonardo Campos
Hi, Andre, On a very different topic, I was trying to find a way to change the JVM default encoding and could not find a way to do so. In this sense, it would also be of my interest to be able to influence the image used by the workers. Sorry for having no help, Leonardo Campos On 1/9/20

dataflow and ImageMagick

2020-01-09 Thread André Rocha Silva
Hi all! I am trying to use imagemagick on Dataflow [Apache Beam Python 3.7 SDK 2.17.0], but I am facing a problem. The function works properly local, but when I use it in Dataflow I receive this message: File "/usr/local/lib/python3.7/site-packages/wand/image.py", line 7888, in read raise