Which Spark version were you using? And could you provide some sample log lines you saw? Parquet uses java.util.logging internally and can't be controlled by log4j.properties. The most recent master branch should have muted most Parquet logs. However, it's known that if you explicitly turn off Parquet data source (by setting spark.sql.parquet.useDataSourceApi to false), and write to a Hive Parquet table via CTAS statements, some Parquet logs produced by the old version of Parquet bundled with Hive dependencies still show up, because we just upgraded Parquet to 1.7.0, whose package name had been changed from "parquet" to "org.apache.parquet".


I’m trying to figure out how to silence all of the logging info that gets printed to the console when dealing with Parquet files. I’ve seen that there have been several PRs addressing this issue, but I can’t seem to figure out how to actually change the logging config. I’ve already messed with the log4j.properties /conf, like so:

log4j.rootCategory=ERROR, console
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose

This does, in fact, silence the logging for everything else, but the Parquet config seems totally unchanged. Does anyone know how to do this?


