Hi Chris,
Which Spark version were you using? And could you provide some sample
log lines you saw? Parquet uses java.util.logging internally and can't
be controlled by log4j.properties. The most recent master branch should
have muted most Parquet logs. However, it's known that if you explicitly
turn off Parquet data source (by setting
spark.sql.parquet.useDataSourceApi to false), and write to a Hive
Parquet table via CTAS statements, some Parquet logs produced by the old
version of Parquet bundled with Hive dependencies still show up, because
we just upgraded Parquet to 1.7.0, whose package name had been changed
from "parquet" to "org.apache.parquet".
Cheng
On 6/13/15 9:29 AM, Chris Freeman wrote:
Hey everyone,
I’m trying to figure out how to silence all of the logging info that
gets printed to the console when dealing with Parquet files. I’ve seen
that there have been several PRs addressing this issue, but I can’t
seem to figure out how to actually change the logging config. I’ve
already messed with the log4j.properties /conf, like so:
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
%p %c{1}: %m%n
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=ERROR
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
This does, in fact, silence the logging for everything else, but the
Parquet config seems totally unchanged. Does anyone know how to do this?
Thanks!
-Chris Freeman