Hi, Xintong Thanks for point it out, after I set up the log path it's working now . so , for conclusion .
on emr , to set up jitwatch in flink-conf.yaml, we should not include quotes and give a path to output the jit log file . this is different from setting it on standalone cluster . example : env.java.opts: -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=/tmp/flinkmemdump.jit -XX:+PrintAssembly Thanks everyone involved in this discussion! Jacky Xintong Song <tonysong...@gmail.com> 于2020年5月12日周二 下午10:41写道: > Hi Jacky, > > I don't think ${FLINK_LOG_PREFIX} is available for Flink Yarn deployment. > This is just my guess, that the actual file name becomes ".jit". You can > try to verify that by looking for the hidden file. > > If it is indeed this problem, you can try to replace "${FLINK_LOG_PREFIX}" > with "<LOG_DIR>/your-file-name.jit". The token "<LOG_DIR>" should be > replaced with proper log directory path by Yarn automatically. > > I noticed that the usage of ${FLINK_LOG_PREFIX} is recommended by Flink's > documentation [1]. This is IMO a bit misleading. I'll try to file an issue > to improve the docs. > > Thank you~ > > Xintong Song > > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/application_profiling.html#profiling-with-jitwatch > > On Wed, May 13, 2020 at 2:45 AM Jacky D <jacky.du0...@gmail.com> wrote: > >> hi, Arvid >> >> thanks for the advice , I removed the quotes and it do created a yarn >> session on EMR , but I didn't find any jit log file generated . >> >> The config with quotes is working on standalone cluster . I also tried to >> dynamic pass the property within the yarn session command : >> >> flink-yarn-session -n 1 -d -nm testSession -yD >> env.java.opts="-XX:+UnlockDiagnosticVMOptions >> -XX:+TraceClassLoading -XX:+LogCompilation >> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" >> >> >> but get same result , session created , but can not find any jit log file >> under container log . >> >> >> Thanks >> >> Jacky >> >> Arvid Heise <ar...@ververica.com> 于2020年5月12日周二 下午12:57写道: >> >>> Hi Jacky, >>> >>> I suspect that the quotes are the actual issue. Could you try to remove >>> them? See also [1]. >>> >>> [1] >>> http://blogs.perl.org/users/tinita/2018/03/strings-in-yaml---to-quote-or-not-to-quote.html >>> >>> On Tue, May 12, 2020 at 4:03 PM Jacky D <jacky.du0...@gmail.com> wrote: >>> >>>> hi, Xintong >>>> >>>> Thanks for reply , I attached those lines below for application master >>>> start command : >>>> >>>> >>>> 2020-05-11 21:16:16,635 DEBUG >>>> org.apache.hadoop.util.PerformanceAdvisory - Crypto >>>> codec org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec is not available. >>>> 2020-05-11 21:16:16,635 DEBUG >>>> org.apache.hadoop.util.PerformanceAdvisory - Using >>>> crypto codec org.apache.hadoop.crypto.JceAesCtrCryptoCodec. >>>> 2020-05-11 21:16:16,636 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DataStreamer block >>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet >>>> packet seqno: 0 offsetInBlock: 0 lastPacketInBlock: false >>>> lastByteOffsetInBlock: 1697 >>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DFSClient seqno: 0 reply: SUCCESS >>>> downstreamAckTimeNanos: 0 flag: 0 >>>> 2020-05-11 21:16:16,637 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DataStreamer block >>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 sending packet >>>> packet seqno: 1 offsetInBlock: 1697 lastPacketInBlock: true >>>> lastByteOffsetInBlock: 1697 >>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - DFSClient seqno: 1 reply: SUCCESS >>>> downstreamAckTimeNanos: 0 flag: 0 >>>> 2020-05-11 21:16:16,638 DEBUG org.apache.hadoop.hdfs.DataStreamer >>>> - Closing old block >>>> BP-1519523618-98.94.65.144-1581106168138:blk_1073745139_4315 >>>> 2020-05-11 21:16:16,641 DEBUG org.apache.hadoop.ipc.Client >>>> - IPC Client (1954985045) connection to >>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #70 >>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.complete >>>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client >>>> - IPC Client (1954985045) connection to >>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value >>>> #70 >>>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine >>>> - Call: complete took 2ms >>>> 2020-05-11 21:16:16,643 DEBUG org.apache.hadoop.ipc.Client >>>> - IPC Client (1954985045) connection to >>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #71 >>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.setTimes >>>> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.Client >>>> - IPC Client (1954985045) connection to >>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value >>>> #71 >>>> 2020-05-11 21:16:16,645 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine >>>> - Call: setTimes took 2ms >>>> 2020-05-11 21:16:16,647 DEBUG org.apache.hadoop.ipc.Client >>>> - IPC Client (1954985045) connection to >>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop sending #72 >>>> org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission >>>> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.Client >>>> - IPC Client (1954985045) connection to >>>> ip-98-94-65-144.ec2.internal/98.94.65.144:8020 from hadoop got value >>>> #72 >>>> 2020-05-11 21:16:16,648 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine >>>> - Call: setPermission took 2ms >>>> 2020-05-11 21:16:16,654 DEBUG >>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Application >>>> Master start command: $JAVA_HOME/bin/java -Xmx424m >>>> "-XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation >>>> -XX:LogFile=${FLINK_LOG_PREFIX}.jit -XX:+PrintAssembly" >>>> -Dlog.file="<LOG_DIR>/jobmanager.log" >>>> -Dlog4j.configuration=file:log4j.properties >>>> org.apache.flink.yarn.entrypoint.YarnSessionClusterEntrypoint 1> >>>> <LOG_DIR>/jobmanager.out 2> <LOG_DIR>/jobmanager.err >>>> 2020-05-11 21:16:16,654 DEBUG org.apache.hadoop.ipc.Client >>>> - stopping client from cache: >>>> org.apache.hadoop.ipc.Client@28194a50 >>>> 2020-05-11 21:16:16,656 DEBUG >>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector >>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports >>>> method setApplicationTags. >>>> 2020-05-11 21:16:16,656 DEBUG >>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector >>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports >>>> method setAttemptFailuresValidityInterval. >>>> 2020-05-11 21:16:16,656 DEBUG >>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector >>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports >>>> method setKeepContainersAcrossApplicationAttempts. >>>> 2020-05-11 21:16:16,656 DEBUG >>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$ApplicationSubmissionContextReflector >>>> - org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext supports >>>> method setNodeLabelExpression. >>>> >>>> Xintong Song <tonysong...@gmail.com> 于2020年5月11日周一 下午10:11写道: >>>> >>>>> Hi Jacky, >>>>> >>>>> Could you search for "Application Master start command:" in the debug >>>>> log and post the result and a few lines before & after that? This is not >>>>> included in the clip of attached log file. >>>>> >>>>> Thank you~ >>>>> >>>>> Xintong Song >>>>> >>>>> >>>>> >>>>> On Tue, May 12, 2020 at 5:33 AM Jacky D <jacky.du0...@gmail.com> >>>>> wrote: >>>>> >>>>>> hi, Robert >>>>>> >>>>>> Thanks so much for quick reply , I changed the log level to debug >>>>>> and attach the log file . >>>>>> >>>>>> Thanks >>>>>> Jacky >>>>>> >>>>>> Robert Metzger <rmetz...@apache.org> 于2020年5月11日周一 下午4:14写道: >>>>>> >>>>>>> Thanks a lot for posting the full output. >>>>>>> >>>>>>> It seems that Flink is passing an invalid list of arguments to the >>>>>>> JVM. >>>>>>> Can you >>>>>>> - set the root log level in conf/log4j-yarn-session.properties to >>>>>>> DEBUG >>>>>>> - then launch the YARN session >>>>>>> - share the log file of the yarn session on the mailing list? >>>>>>> >>>>>>> I'm particularly interested in the line printed here, as it shows >>>>>>> the JVM invocation. >>>>>>> >>>>>>> https://github.com/apache/flink/blob/release-1.6/flink-yarn/src/main/java/org/apache/flink/yarn/AbstractYarnClusterDescriptor.java#L1630 >>>>>>> >>>>>>> >>>>>>> On Mon, May 11, 2020 at 9:56 PM Jacky D <jacky.du0...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi,Robert >>>>>>>> >>>>>>>> Yes , I tried to retrieve more log info from yarn UI , the full >>>>>>>> logs showing below , this happens when I try to create a flink yarn >>>>>>>> session >>>>>>>> on emr when set up jitwatch configuration . >>>>>>>> >>>>>>>> 2020-05-11 19:06:09,552 ERROR >>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli - Error >>>>>>>> while >>>>>>>> running the Flink Yarn session. >>>>>>>> java.lang.reflect.UndeclaredThrowableException >>>>>>>> at >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862) >>>>>>>> at >>>>>>>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) >>>>>>>> at >>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:813) >>>>>>>> Caused by: >>>>>>>> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't >>>>>>>> deploy Yarn session cluster >>>>>>>> at >>>>>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:429) >>>>>>>> at >>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:610) >>>>>>>> at >>>>>>>> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$2(FlinkYarnSessionCli.java:813) >>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:422) >>>>>>>> at >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) >>>>>>>> ... 2 more >>>>>>>> Caused by: >>>>>>>> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: >>>>>>>> The YARN application unexpectedly switched to state FAILED during >>>>>>>> deployment. >>>>>>>> Diagnostics from YARN: Application application_1584459865196_0165 >>>>>>>> failed 1 times (global limit =2; local limit is =1) due to AM >>>>>>>> Container for >>>>>>>> appattempt_1584459865196_0165_000001 exited with exitCode: 1 >>>>>>>> Failing this attempt.Diagnostics: Exception from container-launch. >>>>>>>> Container id: container_1584459865196_0165_01_000001 >>>>>>>> Exit code: 1 >>>>>>>> Exception message: Usage: java [-options] class [args...] >>>>>>>> (to execute a class) >>>>>>>> or java [-options] -jar jarfile [args...] >>>>>>>> (to execute a jar file) >>>>>>>> where options include: >>>>>>>> -d32 use a 32-bit data model if available >>>>>>>> -d64 use a 64-bit data model if available >>>>>>>> -server to select the "server" VM >>>>>>>> The default VM is server, >>>>>>>> because you are running on a server-class machine. >>>>>>>> >>>>>>>> >>>>>>>> -cp <class search path of directories and zip/jar files> >>>>>>>> -classpath <class search path of directories and zip/jar files> >>>>>>>> A : separated list of directories, JAR archives, >>>>>>>> and ZIP archives to search for class files. >>>>>>>> -D<name>=<value> >>>>>>>> set a system property >>>>>>>> -verbose:[class|gc|jni] >>>>>>>> enable verbose output >>>>>>>> -version print product version and exit >>>>>>>> -version:<value> >>>>>>>> Warning: this feature is deprecated and will be >>>>>>>> removed >>>>>>>> in a future release. >>>>>>>> require the specified version to run >>>>>>>> -showversion print product version and continue >>>>>>>> -jre-restrict-search | -no-jre-restrict-search >>>>>>>> Warning: this feature is deprecated and will be >>>>>>>> removed >>>>>>>> in a future release. >>>>>>>> include/exclude user private JREs in the version >>>>>>>> search >>>>>>>> -? -help print this help message >>>>>>>> -X print help on non-standard options >>>>>>>> -ea[:<packagename>...|:<classname>] >>>>>>>> -enableassertions[:<packagename>...|:<classname>] >>>>>>>> enable assertions with specified granularity >>>>>>>> -da[:<packagename>...|:<classname>] >>>>>>>> -disableassertions[:<packagename>...|:<classname>] >>>>>>>> disable assertions with specified granularity >>>>>>>> -esa | -enablesystemassertions >>>>>>>> enable system assertions >>>>>>>> -dsa | -disablesystemassertions >>>>>>>> disable system assertions >>>>>>>> -agentlib:<libname>[=<options>] >>>>>>>> load native agent library <libname>, e.g. >>>>>>>> -agentlib:hprof >>>>>>>> see also, -agentlib:jdwp=help and >>>>>>>> -agentlib:hprof=help >>>>>>>> -agentpath:<pathname>[=<options>] >>>>>>>> load native agent library by full pathname >>>>>>>> -javaagent:<jarpath>[=<options>] >>>>>>>> load Java programming language agent, see >>>>>>>> java.lang.instrument >>>>>>>> -splash:<imagepath> >>>>>>>> show splash screen with specified image >>>>>>>> See >>>>>>>> http://www.oracle.com/technetwork/java/javase/documentation/index.html >>>>>>>> for more details. >>>>>>>> >>>>>>>> Thanks >>>>>>>> Jacky >>>>>>>> >>>>>>>> Robert Metzger <rmetz...@apache.org> 于2020年5月11日周一 下午3:42写道: >>>>>>>> >>>>>>>>> Hey Jacky, >>>>>>>>> >>>>>>>>> The error says "The YARN application unexpectedly switched to >>>>>>>>> state FAILED during deployment.". >>>>>>>>> Have you tried retrieving the YARN application logs? >>>>>>>>> Does the YARN UI / resource manager logs reveal anything on the >>>>>>>>> reason for the deployment to fail? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Robert >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, May 11, 2020 at 9:34 PM Jacky D <jacky.du0...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ---------- Forwarded message --------- >>>>>>>>>> 发件人: Jacky D <jacky.du0...@gmail.com> >>>>>>>>>> Date: 2020年5月11日周一 下午3:12 >>>>>>>>>> Subject: Re: Flink Memory analyze on AWS EMR >>>>>>>>>> To: Khachatryan Roman <khachatryan.ro...@gmail.com> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, Roman >>>>>>>>>> >>>>>>>>>> Thanks for quick response , I tried without logFIle option but >>>>>>>>>> failed with same error , I'm currently using flink 1.6 >>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/monitoring/application_profiling.html, >>>>>>>>>> so I can only use Jitwatch or JMC . I guess those tools only >>>>>>>>>> available on >>>>>>>>>> Standalone cluster ? as document mentioned "Each standalone >>>>>>>>>> JobManager, TaskManager, HistoryServer, and ZooKeeper daemon >>>>>>>>>> redirects >>>>>>>>>> stdout and stderr to a file with a .out filename suffix and >>>>>>>>>> writes internal logging to a file with a .log suffix. Java >>>>>>>>>> options configured by the user in env.java.opts" ? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Jacky >>>>>>>>>> >>>>>>>>> >>> >>> -- >>> >>> Arvid Heise | Senior Java Developer >>> >>> <https://www.ververica.com/> >>> >>> Follow us @VervericaData >>> >>> -- >>> >>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink >>> Conference >>> >>> Stream Processing | Event Driven | Real Time >>> >>> -- >>> >>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >>> >>> -- >>> Ververica GmbH >>> Registered at Amtsgericht Charlottenburg: HRB 158244 B >>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji >>> (Toni) Cheng >>> >>