Hi, Ayan,
Thanks for the suggestion. I did that and got following weird message even I enabled the log aggregation: [root@john1 conf]# yarn logs -applicationId application_1501197841826_0013 17/07/30 16:45:06 INFO client.RMProxy: Connecting to ResourceManager at john1.dg/192.168.6.90:8032 /tmp/logs/root/logs/application_1501197841826_0013does not exist. Log aggregation has not completed or is not enabled. Any other way to see my logs? Thanks John ________________________________ From: ayan guha <guha.a...@gmail.com> Sent: Sunday, July 30, 2017 10:34 PM To: John Zeng; Riccardo Ferrari Cc: User Subject: Re: Logging in RDD mapToPair of Java Spark application Hi As you are using yarn log aggregation, yarn moves all the logs to hdfs after the application completes. You can use following command to get the logs: yarn logs -applicationId <your application id> On Mon, 31 Jul 2017 at 3:17 am, John Zeng <johnz...@hotmail.com<mailto:johnz...@hotmail.com>> wrote: Thanks Riccardo for the valuable info. Following your guidance, I looked at the Spark UI and figured out the default logs location for executors is 'yarn/container-logs'. I ran my Spark app again and I can see a new folder was created for it: [root@john2 application_1501197841826_0013]# ls -l total 24 drwx--x--- 2 yarn yarn 4096 Jul 30 10:07 container_1501197841826_0013_01_000001 drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_01_000002 drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_01_000003 drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_02_000001 drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_02_000002 drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_02_000003 But when I tried to look into each its content, it was gone and there was not file at all from the same place: [root@john2 application_1501197841826_0013]# vi container_1501197841826_0013_* [root@john2 application_1501197841826_0013]# ls -l total 0 [root@john2 application_1501197841826_0013]# pwd /yarn/container-logs/application_1501197841826_0013 I believe Spark moves these logs to a different place. But where are they? Thanks John ________________________________ From: Riccardo Ferrari <ferra...@gmail.com<mailto:ferra...@gmail.com>> Sent: Saturday, July 29, 2017 8:18 PM To: johnzengspark Cc: User Subject: Re: Logging in RDD mapToPair of Java Spark application Hi John, The reason you don't see the second sysout line is because is executed on a different JVM (ie. Driver vs Executor). the second sysout line should be available through the executor logs. Check the Executors tab. There are alternative approaches to manage log centralization however it really depends on what are your requirements. Hope it helps, On Sat, Jul 29, 2017 at 8:09 PM, johnzengspark <johnz...@hotmail.com<mailto:johnz...@hotmail.com>> wrote: Hi, All, Although there are lots of discussions related to logging in this news group, I did not find an answer to my specific question so I am posting mine with the hope that this will not cause a duplicated question. Here is my simplified Java testing Spark app: public class SparkJobEntry { public static void main(String[] args) { // Following line is in stdout from JobTracker UI System.out.println("argc=" + args.length); SparkConf conf = new SparkConf().setAppName("TestSparkApp"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> fileRDD = sc.textFile(args[0]); fileRDD.mapToPair(new PairFunction<String, String, String>() { private static final long serialVersionUID = 1L; @Override public Tuple2<String, String> call(String input) throws Exception { // Following line is not in stdout from JobTracker UI System.out.println("This line should be printed in stdout"); // Other code removed from here to make things simple return new Tuple2<String, String>("1", "Testing data"); }}).saveAsTextFile(args[0] + ".results"); } } What I expected from JobTracker UI is to see both stdout lines: first line is "argc=2" and second line is "This line should be printed in stdout". But I only see the first line which is outside of the 'mapToPair'. I actually have verified my 'mapToPair' is called and the statements after the second logging line were executed. The only issue for me is why the second logging is not in JobTracker UI. Appreciate your help. Thanks John -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Logging-in-RDD-mapToPair-of-Java-Spark-application-tp29007.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> -- Best Regards, Ayan Guha