Re: (YARN CLUSTER MODE) Where to find logs within Spark RDD processing function ?

nguyen duc tuan Fri, 29 Apr 2016 09:54:05 -0700

what does the WebUI show? What do you see when you click on "stderr" and
"stdout" links ? These links must contain stdoutput and stderr for each
executor.
About your custom logging in executor, are you sure you checked "${spark.
yarn.app.container.log.dir}/spark-app.log"
Actual location of this file each executor is ${
yarn.nodemanager.remote-app-log-dir}/{applicationId}/${spark.
yarn.app.container.log.dir}/spark-app.log (yarn.nodemanager.remote-app-log-dir
setting can found in yarn-site.xml in hadoop config folder)
For example, in above example, when I click to "stdout" link with respect
to hslave-13, I get link "
http://hslave-13:8042/node/containerlogs/container_1459219311185_2456_01_000004/tuannd/stdout?start=-4096";,
this means the location of file is in hslave-13: ${
yarn.nodemanager.remote-app-log-dir}/appId/
container_1459219311185_2456_01_000004/spark-app.log


I also see that you forgot to send file "log4j.properties" to executors in
spark-submit command. Executors will try to find log4j.properties in its
execution's folder. In this case, this file is not found, the setting for
logging will be ignored.
You have to add parameters --files /path/to/your/log4j.properties in order
to send this file to executors.


Finally, In order to debug what is happening in executors, you should write
it directly to stdout or stderr. It's much easier to check than go directly
to executor and find your log file :)

2016-04-29 21:30 GMT+07:00 dev loper <spark...@gmail.com>:

> Hi Ted & Nguyen,
>
> @Ted , I was under the belief that if the log4j.properties file would be
> taken from the application classpath if  file path is not specified.
> Please correct me if I am wrong. I tried your approach as well still I
> couldn't find the logs.
>
> @nguyen I am running it on a Yarn cluster , so Spark UI is redirecting me
> to Yarn UI. I couldn't see the logs there as well. I checked the logs on
> both Master and worker. I am running a cluster with one master and one
> worker.  Even I tired yarn logs there also its not turning up. Does yarn
> logs  include executor logs as well ?
>
>
> Request your help to identify the issue .
>
> On Fri, Apr 29, 2016 at 7:32 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> Please use the following syntax:
>>
>> --conf
>>  
>> "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///local/file/log4j.properties"
>>
>> FYI
>>
>> On Fri, Apr 29, 2016 at 6:03 AM, dev loper <spark...@gmail.com> wrote:
>>
>>> Hi Spark Team,
>>>
>>> I have asked the same question on stack overflow  , no luck yet.
>>>
>>>
>>> http://stackoverflow.com/questions/36923949/where-to-find-logs-within-spark-rdd-processing-function-yarn-cluster-mode?noredirect=1#comment61419406_36923949
>>>
>>> I am running my Spark Application on Yarn Cluster. No matter what I do,
>>> I am not able to get the logs within the RDD function printed . Below you
>>> can find the sample snippet which I have written for the RDD processing
>>> function . I have simplified the code to illustrate the syntax I have used
>>> to write the function. When I am running it locally I am able to see the
>>> logs but not in cluster mode. Neither System.err.println nor the logger
>>> seems to be working. But I could see all my driver logs. I even tried to
>>> log using the Root logger , but it was not working at all within the RDD
>>> processing function .I was desperate to see the log messages so finally I
>>> found a guide to use logger as transient (
>>> https://www.mapr.com/blog/how-log-apache-spark) ,but event that didn't
>>> help
>>>
>>> class SampleFlatMapFunction implements PairFlatMapFunction 
>>> <Tuple2<String,String>,String,String>{
>>>
>>>     private static final long serialVersionUID = 6565656322667L;
>>>     transient Logger  executorLogger = 
>>> LogManager.getLogger("sparkExecutor");
>>>
>>>
>>>     private void readObject(java.io.ObjectInputStream in)
>>>             throws IOException, ClassNotFoundException {
>>>             in.defaultReadObject();
>>>             executorLogger = LogManager.getLogger("sparkExecutor");
>>>     }
>>>     @Override
>>>     public Iterable<Tuple2<String,String>> call(Tuple2<String, String> 
>>> tuple)        throws Exception {
>>>
>>>         executorLogger.info(" log testing from  executorLogger ::");
>>>         System.err.println(" log testing from  executorLogger system error 
>>> stream ");
>>>
>>>
>>>             List<Tuple2<String, String>> updates = new ArrayList<>();
>>>             //process Tuple , expand and add it to list.
>>>             return updates;
>>>
>>>          }
>>>  };
>>>
>>> My Log4j Configuration is given below
>>>
>>>     log4j.appender.console=org.apache.log4j.ConsoleAppender
>>>     log4j.appender.console.target=System.err
>>>     log4j.appender.console.layout=org.apache.log4j.PatternLayout
>>>     log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} 
>>> %p %c{1}: %m%n
>>>
>>>     log4j.appender.stdout=org.apache.log4j.ConsoleAppender
>>>     log4j.appender.stdout.target=System.out
>>>     log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
>>>     log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
>>> %c{1}: %m%n
>>>
>>>     log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
>>>     log4j.appender.RollingAppender.File=/var/log/spark/spark.log
>>>     log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
>>>     log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
>>>     log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - 
>>> %m%n
>>>
>>>     
>>> log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
>>>     
>>> log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log
>>>     log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
>>>     log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
>>>     log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M 
>>> - %m%n
>>>
>>>
>>>     # By default, everything goes to console and file
>>>     log4j.rootLogger=INFO, RollingAppender, console
>>>
>>>     # My custom logging goes to another file
>>>     log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU
>>>
>>>
>>> i have tried yarn logs, Spark UI Logs nowhere I could see the log
>>> statements from RDD processing functions . I tried below Approaches but it
>>> didn't work
>>>
>>> yarn logs -applicationId
>>>
>>> I checked even below HDFS path also
>>>
>>> /tmp/logs/
>>>
>>>
>>> I am running my spark-submit command by passing below arguments, Even
>>> then its not working
>>>
>>>   --master yarn --deploy-mode cluster   --conf 
>>> "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"  
>>> --conf 
>>> "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
>>>
>>> Can somebody guide me on logging within spark RDD and map functions ?
>>> What am I missing in the above steps ?
>>>
>>> Thanks
>>>
>>> Dev
>>>
>>
>>
>

Re: (YARN CLUSTER MODE) Where to find logs within Spark RDD processing function ?

Reply via email to