Hello everybody, I ve implemented a Loganalyzer program in spark, which takes the logs from an apache log file and translate it to a given object,
The regex of the log file is GROK, so I m using GROK library to extract the desired field When running the application locally, it succeded without any problem, but when deploying it to yarn (with multiple nodes) I m having an issue with the pattern file that could not be found file:/hadoop-disk1/yarn/local/usercache/hadoop/appcache/application_1454418114641_7429/container_1454418114641_7429_01_000002/./myFatJat-jar-with-dependencies.jar!/haproxy_pattern.txt where the haproxy_pattern.txt is the GROK file I submit my jar as the following: $ spark-submit --master yarn-client --class com.vsct.dt.bigdata.cdn.app.MainRunner --conf spark.driver.extraClassPath=conf/ --conf spark.executor.extraClassPath=conf/ myFatJat-jar-with-dependencies.jar My haproxy_pattern.txt file existe in the sub-directory conf/ More details: th grok API I m using is : <groupId>io.thekraken</groupId> <artifactId>grok</artifactId> <version>0.1.1</version> My code looks like: the map code: JavaRDD<String> rawLog = sc.textFile(configuration.getInput()); JavaRDD<LogEntry> logEntryRDD = rawLog.map(new Function<String, LogEntry>() { private static final long serialVersionUID = 1L; @Override public LogEntry call(String raw_line) throws Exception { grokReader = new SparkGrokReader(configuration); *LogEntry logEntry = grokReader.read(raw_line);* return logEntry; } }).cache(); the method which will extract the fields from the grok: public LogEntry read(String raw_line) { LogEntry logEntry = null; try { Match gm = grok.match(raw_line); gm.captures(); logEntry = buildLogentry(gm.toJson()); } catch (NullPointerException npe) { logger.warn("Line could not be parsed by GROK: {}", raw_line); } return logEntry; }