Hi Cheng, I'm using the official SparkR in the official 1.4 release, loading parquet files with read.df, like so:
txnsRaw <- read.df(sqlCtx, "Customer_Transactions.parquet") And here's a sample of the logging info that gets printed to the console: Jun 14, 2015 9:49:52 AM INFO: parquet.hadoop.ParquetFileReader: Initiating action with parallelism: 5 Jun 14, 2015 9:49:57 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Jun 14, 2015 9:49:58 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 195289 records. Jun 14, 2015 9:49:58 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block Jun 14, 2015 9:49:58 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 82 ms. row count = 195289 Jun 14, 2015 9:50:06 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 194348 records. Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block Jun 14, 2015 9:50:06 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 195289 records. Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 2 ms. row count = 195289 Jun 14, 2015 9:50:06 AM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 17 ms. row count = 194348 Jun 14, 2015 9:50:15 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl Jun 14, 2015 9:50:15 AM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl I'm not currently doing anything with Hive tables or even regular HDFS. Just running Spark in local mode and reading a local Parquet file from my hard drive. -Chris ________________________________ From: Cheng Lian [lian.cs....@gmail.com] Sent: Saturday, June 13, 2015 6:56 PM To: Chris Freeman; user@spark.apache.org Subject: Re: How to silence Parquet logging? Hi Chris, Which Spark version were you using? And could you provide some sample log lines you saw? Parquet uses java.util.logging internally and can't be controlled by log4j.properties. The most recent master branch should have muted most Parquet logs. However, it's known that if you explicitly turn off Parquet data source (by setting spark.sql.parquet.useDataSourceApi to false), and write to a Hive Parquet table via CTAS statements, some Parquet logs produced by the old version of Parquet bundled with Hive dependencies still show up, because we just upgraded Parquet to 1.7.0, whose package name had been changed from "parquet" to "org.apache.parquet". Cheng On 6/13/15 9:29 AM, Chris Freeman wrote: Hey everyone, I’m trying to figure out how to silence all of the logging info that gets printed to the console when dealing with Parquet files. I’ve seen that there have been several PRs addressing this issue, but I can’t seem to figure out how to actually change the logging config. I’ve already messed with the log4j.properties /conf, like so: log4j.rootCategory=ERROR, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Settings to quiet third party logs that are too verbose log4j.logger.org.spark-project.jetty=ERROR log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR This does, in fact, silence the logging for everything else, but the Parquet config seems totally unchanged. Does anyone know how to do this? Thanks! -Chris Freeman