Re: Logging in RDD mapToPair of Java Spark application

ayan guha Sun, 30 Jul 2017 15:35:03 -0700

Hi

As you are using yarn log aggregation, yarn moves all the logs to hdfs
after the application completes.


You can use following command to get the logs:
yarn logs -applicationId <your application id>



On Mon, 31 Jul 2017 at 3:17 am, John Zeng <johnz...@hotmail.com> wrote:

> Thanks Riccardo for the valuable info.
>
>
> Following your guidance, I looked at the Spark UI and figured out the
> default logs location for executors is 'yarn/container-logs'.  I ran my
> Spark app again and I can see a new folder was created for it:
>
>
> [root@john2 application_1501197841826_0013]# ls -l
> total 24
> drwx--x--- 2 yarn yarn 4096 Jul 30 10:07
> container_1501197841826_0013_01_000001
> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08
> container_1501197841826_0013_01_000002
> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08
> container_1501197841826_0013_01_000003
> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08
> container_1501197841826_0013_02_000001
> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08
> container_1501197841826_0013_02_000002
> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08
> container_1501197841826_0013_02_000003
>
> But when I tried to look into each its content, it was gone and there was
> not file at all from the same place:
>
> [root@john2 application_1501197841826_0013]# vi
> container_1501197841826_0013_*
> [root@john2 application_1501197841826_0013]# ls -l
> total 0
> [root@john2 application_1501197841826_0013]# pwd
> /yarn/container-logs/application_1501197841826_0013
>
> I believe Spark moves these logs to a different place.  But where are they?
>
> Thanks
>
> John
>
>
>
>
> ------------------------------
> *From:* Riccardo Ferrari <ferra...@gmail.com>
> *Sent:* Saturday, July 29, 2017 8:18 PM
> *To:* johnzengspark
> *Cc:* User
> *Subject:* Re: Logging in RDD mapToPair of Java Spark application
>
> Hi John,
>
> The reason you don't see the second sysout line is because is executed on
> a different JVM (ie. Driver vs Executor). the second sysout line should be
> available through the executor logs. Check the Executors tab.
>
> There are alternative approaches to manage log centralization however it
> really depends on what are your requirements.
>
> Hope it helps,
>
> On Sat, Jul 29, 2017 at 8:09 PM, johnzengspark <johnz...@hotmail.com>
> wrote:
>
>> Hi, All,
>>
>> Although there are lots of discussions related to logging in this news
>> group, I did not find an answer to my specific question so I am posting
>> mine
>> with the hope that this will not cause a duplicated question.
>>
>> Here is my simplified Java testing Spark app:
>>
>> public class SparkJobEntry {
>>         public static void main(String[] args) {
>>                 // Following line is in stdout from JobTracker UI
>>                 System.out.println("argc=" + args.length);
>>
>>                 SparkConf conf = new
>> SparkConf().setAppName("TestSparkApp");
>>                 JavaSparkContext sc = new JavaSparkContext(conf);
>>                 JavaRDD<String> fileRDD = sc.textFile(args[0]);
>>
>>                 fileRDD.mapToPair(new PairFunction<String, String,
>> String>() {
>>
>>                         private static final long serialVersionUID = 1L;
>>
>>                         @Override
>>                         public Tuple2<String, String> call(String input)
>> throws Exception {
>>                                 // Following line is not in stdout from
>> JobTracker UI
>>                                 System.out.println("This line should be
>> printed in stdout");
>>                                 // Other code removed from here to make
>> things simple
>>                                 return new Tuple2<String, String>("1",
>> "Testing data");
>>                         }}).saveAsTextFile(args[0] + ".results");
>>         }
>> }
>>
>> What I expected from JobTracker UI is to see both stdout lines: first line
>> is "argc=2" and second line is "This line should be printed in stdout".
>> But
>> I only see the first line which is outside of the 'mapToPair'.  I actually
>> have verified my 'mapToPair' is called and the statements after the second
>> logging line were executed.  The only issue for me is why the second
>> logging
>> is not in JobTracker UI.
>>
>> Appreciate your help.
>>
>> Thanks
>>
>> John
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Logging-in-RDD-mapToPair-of-Java-Spark-application-tp29007.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
> --
Best Regards,
Ayan Guha

Re: Logging in RDD mapToPair of Java Spark application

Reply via email to