RE: How to silence Parquet logging?

Chris Freeman Sun, 14 Jun 2015 07:54:06 -0700

Hi Cheng,

I'm using the official SparkR in the official 1.4 release, loading parquet 
files with read.df, like so:


txnsRaw <- read.df(sqlCtx, "Customer_Transactions.parquet")

And here's a sample of the logging info that gets printed to the console:


Jun 14, 2015 9:49:52 AM INFO: parquet.hadoop.ParquetFileReader: Initiating 
action with parallelism: 5
Jun 14, 2015 9:49:57 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not 
initialize counter due to context is not a instance of TaskInputOutputContext, 
but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jun 14, 2015 9:49:58 AM INFO: parquet.hadoop.InternalParquetRecordReader: 
RecordReader initialized will read a total of 195289 records.
Jun 14, 2015 9:49:58 AM INFO: parquet.hadoop.InternalParquetRecordReader: at 
row 0. reading next block
Jun 14, 2015 9:49:58 AM INFO: parquet.hadoop.InternalParquetRecordReader: block 
read in memory in 82 ms. row count = 195289
Jun 14, 2015 9:50:06 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not 
initialize counter due to context is not a instance of TaskInputOutputContext, 
but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: 
RecordReader initialized will read a total of 194348 records.
Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: at 
row 0. reading next block
Jun 14, 2015 9:50:06 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not 
initialize counter due to context is not a instance of TaskInputOutputContext, 
but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: 
RecordReader initialized will read a total of 195289 records.
Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: at 
row 0. reading next block
Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: block 
read in memory in 2 ms. row count = 195289
Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: block 
read in memory in 17 ms. row count = 194348
Jun 14, 2015 9:50:15 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not 
initialize counter due to context is not a instance of TaskInputOutputContext, 
but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
Jun 14, 2015 9:50:15 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not 
initialize counter due to context is not a instance of TaskInputOutputContext, 
but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl



I'm not currently doing anything with Hive tables or even regular HDFS. Just 
running Spark in local mode and reading a local Parquet file from my hard drive.

-Chris
________________________________
From: Cheng Lian [lian.cs....@gmail.com]
Sent: Saturday, June 13, 2015 6:56 PM
To: Chris Freeman; user@spark.apache.org
Subject: Re: How to silence Parquet logging?

Hi Chris,

Which Spark version were you using? And could you provide some sample log lines 
you saw? Parquet uses java.util.logging internally and can't be controlled by 
log4j.properties. The most recent master branch should have muted most Parquet 
logs. However, it's known that if you explicitly turn off Parquet data source 
(by setting spark.sql.parquet.useDataSourceApi to false), and write to a Hive 
Parquet table via CTAS statements, some Parquet logs produced by the old 
version of Parquet bundled with Hive dependencies still show up, because we 
just upgraded Parquet to 1.7.0, whose package name had been changed from 
"parquet" to "org.apache.parquet".

Cheng

On 6/13/15 9:29 AM, Chris Freeman wrote:
Hey everyone,

I’m trying to figure out how to silence all of the logging info that gets 
printed to the console when dealing with Parquet files. I’ve seen that there 
have been several PRs addressing this issue, but I can’t seem to figure out how 
to actually change the logging config.  I’ve already messed with the 
log4j.properties /conf, like so:

log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: 
%m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=ERROR
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR

This does, in fact, silence the logging for everything else, but the Parquet 
config seems totally unchanged. Does anyone know how to do this?

Thanks!

-Chris Freeman

RE: How to silence Parquet logging?

Reply via email to