Re: Logging in RDD mapToPair of Java Spark application

ayan guha Sun, 30 Jul 2017 17:28:03 -0700

Not that I can think of. If you have Spark history Server running then it
may be another place to look


On Mon, Jul 31, 2017 at 9:48 AM, John Zeng <johnz...@hotmail.com> wrote:

> Hi, Ayan,
>
>
> Thanks for the suggestion.  I did that and got following weird message
> even I enabled the log aggregation:
>
>
> [root@john1 conf]# yarn logs -applicationId application_1501197841826_0013
> 17/07/30 16:45:06 INFO client.RMProxy: Connecting to ResourceManager at
> john1.dg/192.168.6.90:8032
> /tmp/logs/root/logs/application_1501197841826_0013does not exist.
> Log aggregation has not completed or is not enabled.
>
> Any other way to see my logs?
>
> Thanks
>
> John
>
>
>
>
> ------------------------------
> *From:* ayan guha <guha.a...@gmail.com>
> *Sent:* Sunday, July 30, 2017 10:34 PM
> *To:* John Zeng; Riccardo Ferrari
>
> *Cc:* User
> *Subject:* Re: Logging in RDD mapToPair of Java Spark application
>
> Hi
>
> As you are using yarn log aggregation, yarn moves all the logs to hdfs
> after the application completes.
>
> You can use following command to get the logs:
> yarn logs -applicationId <your application id>
>
>
>
> On Mon, 31 Jul 2017 at 3:17 am, John Zeng <johnz...@hotmail.com> wrote:
>
>> Thanks Riccardo for the valuable info.
>>
>>
>> Following your guidance, I looked at the Spark UI and figured out the
>> default logs location for executors is 'yarn/container-logs'.  I ran my
>> Spark app again and I can see a new folder was created for it:
>>
>>
>> [root@john2 application_1501197841826_0013]# ls -l
>> total 24
>> drwx--x--- 2 yarn yarn 4096 Jul 30 10:07 container_1501197841826_0013_
>> 01_000001
>> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_
>> 01_000002
>> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_
>> 01_000003
>> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_
>> 02_000001
>> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_
>> 02_000002
>> drwx--x--- 2 yarn yarn 4096 Jul 30 10:08 container_1501197841826_0013_
>> 02_000003
>>
>> But when I tried to look into each its content, it was gone and there was
>> not file at all from the same place:
>>
>> [root@john2 application_1501197841826_0013]# vi
>> container_1501197841826_0013_*
>> [root@john2 application_1501197841826_0013]# ls -l
>> total 0
>> [root@john2 application_1501197841826_0013]# pwd
>> /yarn/container-logs/application_1501197841826_0013
>>
>> I believe Spark moves these logs to a different place.  But where are
>> they?
>>
>> Thanks
>>
>> John
>>
>>
>>
>>
>> ------------------------------
>> *From:* Riccardo Ferrari <ferra...@gmail.com>
>> *Sent:* Saturday, July 29, 2017 8:18 PM
>> *To:* johnzengspark
>> *Cc:* User
>> *Subject:* Re: Logging in RDD mapToPair of Java Spark application
>>
>> Hi John,
>>
>> The reason you don't see the second sysout line is because is executed on
>> a different JVM (ie. Driver vs Executor). the second sysout line should be
>> available through the executor logs. Check the Executors tab.
>>
>> There are alternative approaches to manage log centralization however it
>> really depends on what are your requirements.
>>
>> Hope it helps,
>>
>> On Sat, Jul 29, 2017 at 8:09 PM, johnzengspark <johnz...@hotmail.com>
>> wrote:
>>
>>> Hi, All,
>>>
>>> Although there are lots of discussions related to logging in this news
>>> group, I did not find an answer to my specific question so I am posting
>>> mine
>>> with the hope that this will not cause a duplicated question.
>>>
>>> Here is my simplified Java testing Spark app:
>>>
>>> public class SparkJobEntry {
>>>         public static void main(String[] args) {
>>>                 // Following line is in stdout from JobTracker UI
>>>                 System.out.println("argc=" + args.length);
>>>
>>>                 SparkConf conf = new SparkConf().setAppName("
>>> TestSparkApp");
>>>                 JavaSparkContext sc = new JavaSparkContext(conf);
>>>                 JavaRDD<String> fileRDD = sc.textFile(args[0]);
>>>
>>>                 fileRDD.mapToPair(new PairFunction<String, String,
>>> String>() {
>>>
>>>                         private static final long serialVersionUID = 1L;
>>>
>>>                         @Override
>>>                         public Tuple2<String, String> call(String input)
>>> throws Exception {
>>>                                 // Following line is not in stdout from
>>> JobTracker UI
>>>                                 System.out.println("This line should be
>>> printed in stdout");
>>>                                 // Other code removed from here to make
>>> things simple
>>>                                 return new Tuple2<String, String>("1",
>>> "Testing data");
>>>                         }}).saveAsTextFile(args[0] + ".results");
>>>         }
>>> }
>>>
>>> What I expected from JobTracker UI is to see both stdout lines: first
>>> line
>>> is "argc=2" and second line is "This line should be printed in stdout".
>>> But
>>> I only see the first line which is outside of the 'mapToPair'.  I
>>> actually
>>> have verified my 'mapToPair' is called and the statements after the
>>> second
>>> logging line were executed.  The only issue for me is why the second
>>> logging
>>> is not in JobTracker UI.
>>>
>>> Appreciate your help.
>>>
>>> Thanks
>>>
>>> John
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/Logging-in-RDD-mapToPair-of-Java-
>>> Spark-application-tp29007.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>> --
> Best Regards,
> Ayan Guha
>



-- 
Best Regards,
Ayan Guha

Re: Logging in RDD mapToPair of Java Spark application

Reply via email to