Hi Jorn,
Thanks for the help. I switched to using Apache Parquet 1.8.3 and now Spark
successfully loads the parquet file.
Do you have any hint for the other part of my question? What is the correct
way to reproduce this schema:
message Document {
required int64 DocId;
optional group Links {
Hi Jorn,
I am using Apache Spark 2.3.1.
For creating the parquet file I have used Apache Parquet (parquet-mr) 1.10.
This does not match the version of parquet used in Apache Spark 2.3.1 and if
you think that this could be the problem I could try to use Apache Parquet
version 1.8.3.
I created a p
Hi Gourav,
the question in fact is are there any the limitations of Apache Spark
support for Parquet file format.
The example schema from the dremel paper is something that is supported in
Apache Parquet (using Apache Parquet Java API).
Now I am trying to implement the same schema using Apache S
Hi,
I'm trying to reproduce the example from dremel paper
(https://research.google.com/pubs/archive/36632.pdf) in Apache Spark using
pyspark and I wonder if it is possible at all?
Trying to follow the paper example as close as possible I created this
document type:
from pyspark.sql.types import