Hi all,
Found the answer from the following link:
https://forums.databricks.com/questions/918/how-to-set-size-of-parquet-output-files.html
I can successfully setup parquet block size with
spark.hadoop.parquet.block.size.
The following is the sample code:
# init
block_size = 512 * 1024
conf =
Hi all,
Looks like it's parquet-specific issue.
I can successfully write with 512k block-size
if I use df.write.csv() or use df.write.text()
(I can successfully do csv write when I put hadoop-lzo-0.4.15-cdh5.13.0.jar
into the jars dir)
sample code:
block_size = 512 * 1024
conf =
SparkConf().s