Hello everybody,

I ve implemented a Loganalyzer program in spark, which takes the logs from
an apache log file and translate it to a given object,

The regex of the log file is GROK, so I m using GROK library to extract the
desired field

When running the application locally, it succeded without any problem, but
when deploying it to yarn (with multiple nodes) I m having an issue with
the pattern file that could not be found


where the haproxy_pattern.txt is the GROK file

I submit my jar as the following:

$ spark-submit --master yarn-client --class
com.vsct.dt.bigdata.cdn.app.MainRunner  --conf
spark.driver.extraClassPath=conf/ --conf
spark.executor.extraClassPath=conf/  myFatJat-jar-with-dependencies.jar

My haproxy_pattern.txt file existe in the sub-directory conf/

More details:

th grok API I m using is :

My code looks like:
the map code:

        JavaRDD<String> rawLog = sc.textFile(configuration.getInput());
        JavaRDD<LogEntry> logEntryRDD = rawLog.map(new Function<String,
LogEntry>() {

            private static final long serialVersionUID = 1L;

            public LogEntry call(String raw_line) throws Exception {
                grokReader = new SparkGrokReader(configuration);
                *LogEntry logEntry = grokReader.read(raw_line);*
                return logEntry;

the method which will extract the fields from the grok:

public LogEntry read(String raw_line) {
LogEntry logEntry = null;
        try {
            Match gm = grok.match(raw_line);
            logEntry = buildLogentry(gm.toJson());
        } catch (NullPointerException npe) {
            logger.warn("Line could not be parsed by GROK: {}", raw_line);
return logEntry;

Reply via email to