Re: Logging in RDD mapToPair of Java Spark application

John Zeng Sun, 30 Jul 2017 16:49:04 -0700

Hi, Ayan,


Thanks for the suggestion.  I did that and got following weird message even I 
enabled the log aggregation:


[root@john1 conf]# yarn logs -applicationId application_1501197841826_0013
17/07/30 16:45:06 INFO client.RMProxy: Connecting to ResourceManager at 
john1.dg/192.168.6.90:8032
/tmp/logs/root/logs/application_1501197841826_0013does not exist.
Log aggregation has not completed or is not enabled.

Any other way to see my logs?

Thanks

John




________________________________
From: ayan guha <guha.a...@gmail.com>
Sent: Sunday, July 30, 2017 10:34 PM
To: John Zeng; Riccardo Ferrari
Cc: User
Subject: Re: Logging in RDD mapToPair of Java Spark application

Hi

As you are using yarn log aggregation, yarn moves all the logs to hdfs after 
the application completes.

You can use following command to get the logs:
yarn logs -applicationId <your application id>



On Mon, 31 Jul 2017 at 3:17 am, John Zeng 
<johnz...@hotmail.com<mailto:johnz...@hotmail.com>> wrote:

Thanks Riccardo for the valuable info.


Following your guidance, I looked at the Spark UI and figured out the default 
logs location for executors is 'yarn/container-logs'.  I ran my Spark app again 
and I can see a new folder was created for it:


[root@john2 application_1501197841826_0013]# ls -l
total 24
drwx--x--- 2 yarn yarn 4096 Jul 30 10:07 container_1501197841826_0013_01_000001
drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_01_000002
drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_01_000003
drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_02_000001
drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_02_000002
drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_02_000003

But when I tried to look into each its content, it was gone and there was not 
file at all from the same place:

[root@john2 application_1501197841826_0013]# vi container_1501197841826_0013_*
[root@john2 application_1501197841826_0013]# ls -l
total 0
[root@john2 application_1501197841826_0013]# pwd
/yarn/container-logs/application_1501197841826_0013

I believe Spark moves these logs to a different place.  But where are they?

Thanks

John




________________________________
From: Riccardo Ferrari <ferra...@gmail.com<mailto:ferra...@gmail.com>>
Sent: Saturday, July 29, 2017 8:18 PM
To: johnzengspark
Cc: User
Subject: Re: Logging in RDD mapToPair of Java Spark application

Hi John,

The reason you don't see the second sysout line is because is executed on a 
different JVM (ie. Driver vs Executor). the second sysout line should be 
available through the executor logs. Check the Executors tab.

There are alternative approaches to manage log centralization however it really 
depends on what are your requirements.

Hope it helps,

On Sat, Jul 29, 2017 at 8:09 PM, johnzengspark 
<johnz...@hotmail.com<mailto:johnz...@hotmail.com>> wrote:
Hi, All,

Although there are lots of discussions related to logging in this news
group, I did not find an answer to my specific question so I am posting mine
with the hope that this will not cause a duplicated question.

Here is my simplified Java testing Spark app:

public class SparkJobEntry {
        public static void main(String[] args) {
                // Following line is in stdout from JobTracker UI
                System.out.println("argc=" + args.length);

                SparkConf conf = new SparkConf().setAppName("TestSparkApp");
                JavaSparkContext sc = new JavaSparkContext(conf);
                JavaRDD<String> fileRDD = sc.textFile(args[0]);

                fileRDD.mapToPair(new PairFunction<String, String, String>() {

                        private static final long serialVersionUID = 1L;

                        @Override
                        public Tuple2<String, String> call(String input) throws 
Exception {
                                // Following line is not in stdout from 
JobTracker UI
                                System.out.println("This line should be printed 
in stdout");
                                // Other code removed from here to make things 
simple
                                return new Tuple2<String, String>("1", "Testing 
data");
                        }}).saveAsTextFile(args[0] + ".results");
        }
}

What I expected from JobTracker UI is to see both stdout lines: first line
is "argc=2" and second line is "This line should be printed in stdout".  But
I only see the first line which is outside of the 'mapToPair'.  I actually
have verified my 'mapToPair' is called and the statements after the second
logging line were executed.  The only issue for me is why the second logging
is not in JobTracker UI.

Appreciate your help.

Thanks

John



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Logging-in-RDD-mapToPair-of-Java-Spark-application-tp29007.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>


--
Best Regards,
Ayan Guha

Re: Logging in RDD mapToPair of Java Spark application

Reply via email to