Well I have already tried that. You are talking about a command similar to this right? *yarn logs -applicationId application_Number * This gives me the processing logs, that contain information about the tasks, RDD blocks etc.
What I really need is the output log that gets generated as part of the Spark job. Which means I generate some output by the Spark job that gets written to a file mentioned in the job itself. So this file is currently residing within the appcache, is there a way that I can get this once the job is over? On Wed, Sep 21, 2016 at 4:00 PM, ayan guha <guha.a...@gmail.com> wrote: > On yarn, logs are aggregated from each containers to hdfs. You can use > yarn CLI or ui to view. For spark, you would have a history server which > consolidate s the logs > On 21 Sep 2016 19:03, "Nisha Menon" <nisha.meno...@gmail.com> wrote: > >> I looked at the driver logs, that reminded me that I needed to look at >> the executor logs. There the issue was that the spark executors were not >> getting a configuration file. I broadcasted the file and now the processing >> happens. Thanks for the suggestion. >> Currently my issue is that the log file generated independently by the >> executors goes to the respective containers' appcache, and then it gets >> lost. Is there a recommended way to get the output files from the >> individual executors? >> >> On Thu, Sep 8, 2016 at 12:32 PM, Sonal Goyal <sonalgoy...@gmail.com> >> wrote: >> >>> Are you looking at the worker logs or the driver? >>> >>> >>> On Thursday, September 8, 2016, Nisha Menon <nisha.meno...@gmail.com> >>> wrote: >>> >>>> I have an RDD created as follows: >>>> >>>> * JavaPairRDD<String,String> inputDataFiles = >>>> sparkContext.wholeTextFiles("hdfs://ip:8020/user/cdhuser/inputFolder/");* >>>> >>>> On this RDD I perform a map to process individual files and invoke a >>>> foreach to trigger the same map. >>>> >>>> * JavaRDD<Object[]> output = inputDataFiles.map(new >>>> Function<Tuple2<String,String>,Object[]>()* >>>> * {* >>>> >>>> * private static final long serialVersionUID = 1L;* >>>> >>>> * @Override* >>>> * public Object[] call(Tuple2<String,String> v1) throws Exception * >>>> * { * >>>> * System.out.println("in map!");* >>>> * //do something with v1. * >>>> * return Object[]* >>>> * } * >>>> * });* >>>> >>>> * output.foreach(new VoidFunction<Object[]>() {* >>>> >>>> * private static final long serialVersionUID = 1L;* >>>> >>>> * @Override* >>>> * public void call(Object[] t) throws Exception {* >>>> * //do nothing!* >>>> * System.out.println("in foreach!");* >>>> * }* >>>> * }); * >>>> >>>> This code works perfectly fine for standalone setup on my local laptop >>>> while accessing both local files as well as remote HDFS files. >>>> >>>> In cluster the same code produces no results. My intuition is that the >>>> data has not reached the individual executors and hence both the `map` and >>>> `foreach` does not work. It might be a guess. But I am not able to figure >>>> out why this would not work in cluster. I dont even see the print >>>> statements in `map` and `foreach` getting printed in cluster mode of >>>> execution. >>>> >>>> I notice a particular line in standalone output that I do NOT see in >>>> cluster execution. >>>> >>>> *16/09/07 17:35:35 INFO WholeTextFileRDD: Input split: >>>> Paths:/user/cdhuser/inputFolder/data1.txt:0+657345,/user/cdhuser/inputFolder/data10.txt:0+657345,/user/cdhuser/inputFolder/data2.txt:0+657345,/user/cdhuser/inputFolder/data3.txt:0+657345,/user/cdhuser/inputFolder/data4.txt:0+657345,/user/cdhuser/inputFolder/data5.txt:0+657345,/user/cdhuser/inputFolder/data6.txt:0+657345,/user/cdhuser/inputFolder/data7.txt:0+657345,/user/cdhuser/inputFolder/data8.txt:0+657345,/user/cdhuser/inputFolder/data9.txt:0+657345* >>>> >>>> I had a similar code with textFile() that worked earlier for individual >>>> files on cluster. The issue is with wholeTextFiles() only. >>>> >>>> Please advise what is the best way to get this working or other >>>> alternate ways. >>>> >>>> My setup is cloudera 5.7 distribution with Spark Service. I used the >>>> master as `yarn-client`. >>>> >>>> The action can be anything. Its just a dummy step to invoke the map. I >>>> also tried *System.out.println("Count is:"+output.count());*, for >>>> which I got the correct answer of `10`, since there were 10 files in the >>>> folder, but still the map refuses to work. >>>> >>>> Thanks. >>>> >>>> >>> >>> -- >>> Thanks, >>> Sonal >>> Nube Technologies <http://www.nubetech.co> >>> >>> <http://in.linkedin.com/in/sonalgoyal> >>> >>> >>> >>> >> >> >> -- >> Nisha Menon >> BTech (CS) Sahrdaya CET, >> MTech (CS) IIIT Banglore. >> > -- Nisha Menon BTech (CS) Sahrdaya CET, MTech (CS) IIIT Banglore.