from:"chhsiao1981"

Re: pyspark DataFrameWriter ignores customized settings?

2018-03-16 Thread chhsiao1981

Hi all, Found the answer from the following link: https://forums.databricks.com/questions/918/how-to-set-size-of-parquet-output-files.html I can successfully setup parquet block size with spark.hadoop.parquet.block.size. The following is the sample code: # init block_size = 512 * 1024 conf =

Re: pyspark DataFrameWriter ignores customized settings?

2018-03-16 Thread chhsiao1981

Hi all, Looks like it's parquet-specific issue. I can successfully write with 512k block-size if I use df.write.csv() or use df.write.text() (I can successfully do csv write when I put hadoop-lzo-0.4.15-cdh5.13.0.jar into the jars dir) sample code: block_size = 512 * 1024 conf = SparkConf().s