We are currently using Camus for Kafka to HDFS pipeline to store as
SequenceFiles but I understand Spark Streaming can be used to save as
Parquet. As I read about Parquet, the layout is optimized for queries
against large file sizes. Are there any options in Spark to specify the
block size to help with this or it is dependent on the specified time
window?

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-HDFS-to-store-as-Parquet-format-tp15768.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to