Re: Memory profiling on Dataflow with java

2019-11-21 Thread Reynaldo Baquerizo
No, I have not solved the issue yet. Steve's suggestion didn't work out for me unfortunately, jvisualvm fails to connect with jmx. Cloud support suggested : PROJECT="project_id" ZONE="worker_zone" WORKER="worker_id" gcloud compute scp --project=$PROJECT --zone=$ZONE \ "$WORKER:/var/log/dataflow/ja

回复: Memory profiling on Dataflow with java

2019-11-21 Thread 👌👌
I use the beam run on spark.Because of i can not write udf for partitioner function,so i don't have ideas to slove it!!! -- 原始邮件 -- 发件人: "👌👌"<1150693...@qq.com>; 发送时间: 2019年11月21日(星期四) 下午5:26 收件人: "user"https://beam.apache.org/releases/javadoc/2.16.0/org/apache/

?????? Memory profiling on Dataflow with java

2019-11-21 Thread ?9?0?9?0
I also can't save the heap dump,I find that no method do it. So,I don't solve the problem until now. Thanks for your answer!!! --  -- ??: "Frantisek Csajka"https://beam.apache.org/releases/javadoc/2.16.0/org/apache/beam/runners/dataflow/options/Datafl

Re: Memory profiling on Dataflow with java

2019-11-21 Thread Frantisek Csajka
Hi, Have you succeeded saving a heap dump? I've also run into this a while ago and was not able to save a heap dump nor increase the boot disc size. If you have any update on this, could you please share? Thanks in advance, Frantisek On Wed, Nov 20, 2019 at 1:46 AM Luke Cwik wrote: > You might

Re: Memory profiling on Dataflow with java

2019-11-19 Thread Luke Cwik
You might want to reach out to cloud support for help with debugging this and/or help with how to debug this. On Mon, Nov 18, 2019 at 10:56 AM Jeff Klukas wrote: > On Mon, Nov 18, 2019 at 1:32 PM Reynaldo Baquerizo < > reynaldo.michel...@bairesdev.com> wrote: > >> >> Does it tell anything that t

Re: Memory profiling on Dataflow with java

2019-11-18 Thread Jeff Klukas
On Mon, Nov 18, 2019 at 1:32 PM Reynaldo Baquerizo < reynaldo.michel...@bairesdev.com> wrote: > > Does it tell anything that the GCP console does not show the options > --dumpHeapOnOOM --saveHeapDumpsToGcsPath of a running job under > PipelineOptions (it does for diskSizeGb)? > That's normal; I a

Re: Memory profiling on Dataflow with java

2019-11-18 Thread Reynaldo Baquerizo
Hi Luke, Yes, I tried port forwarding too, but the VM instance does not have ssh, socat, or iptables (can't install anything either, I saw chromeOs is based on Gentoo and tried emerge) and I think only ssh port 22 is the only allowed port. I also omitted that it is Batch job which oom-ing, so I can

Re: Memory profiling on Dataflow with java

2019-11-18 Thread Reynaldo Baquerizo
Oh, I had overlooked --diskSizeGb. I did read the CAUTION, but I did not know how to increase it. Unfortunately, I still can't get it to work. Does it tell anything that the GCP console does not show the options --dumpHeapOnOOM --saveHeapDumpsToGcsPath of a running job under PipelineOptions (it do

Re: Memory profiling on Dataflow with java

2019-11-18 Thread Steve Niemitz
If you go the port forwarding route, you need to use a SOCKS proxy as well as forwarding the JMX port because of how JMX works. For example, I SSH into a worker with: ssh *-D -L :127.0.0.1: * and then launch eg, jvisualvm with: jvisualvm -J-DsocksProxyHost=loc

Re: Memory profiling on Dataflow with java

2019-11-18 Thread Luke Cwik
What Jeff mentioned is the easiest way to get heap dumps on OOM. If you want to connect to JMX, try using an SSH tunnel and forward the ports. On Mon, Nov 18, 2019 at 8:59 AM Jeff Klukas wrote: > Using default Dataflow workers, this is the set of options I passed: > > --dumpHeapOnOOM --saveHeap

Re: Memory profiling on Dataflow with java

2019-11-18 Thread Jeff Klukas
Using default Dataflow workers, this is the set of options I passed: --dumpHeapOnOOM --saveHeapDumpsToGcsPath=$MYBUCKET/heapdump --diskSizeGb=100 On Mon, Nov 18, 2019 at 11:57 AM Jeff Klukas wrote: > It sounds like you're generally doing the right thing. I've successfully > used --saveHeapDump

Re: Memory profiling on Dataflow with java

2019-11-18 Thread Jeff Klukas
It sounds like you're generally doing the right thing. I've successfully used --saveHeapDumpsToGcsPath in a Java pipeline running on Dataflow and inspected the results in Eclipse MAT. I think that --saveHeapDumpsToGcsPath will automatically turn on --dumpHeapOnOOM but worth setting that explicitly

Memory profiling on Dataflow with java

2019-11-18 Thread Reynaldo Baquerizo
Hi all, We are running into OOM issues with one of our pipelines. They are not reproducible with DirectRunner, only with Dataflow. I tried --saveHeapDumpsToGcsPath, but it does not save any heap dump (MyOptions extends DataflowPipelineDebugOptions) I looked at the java process inside the docker co