Hi Jorn,
Thanks for the help. I switched to using Apache Parquet 1.8.3 and now Spark
successfully loads the parquet file.
Do you have any hint for the other part of my question? What is the correct
way to reproduce this schema:
message Document {
required int64 DocId;
optional group Links {
I would try with the same version as Spark uses first. I don’t have the
changelog of parquet in my head (but you can find it ok the Internet), but it
could be the cause of your issues.
> Am 31.10.2018 um 12:26 schrieb lchorbadjiev :
>
> Hi Jorn,
>
> I am using Apache Spark 2.3.1.
>
> For crea
Hi Jorn,
I am using Apache Spark 2.3.1.
For creating the parquet file I have used Apache Parquet (parquet-mr) 1.10.
This does not match the version of parquet used in Apache Spark 2.3.1 and if
you think that this could be the problem I could try to use Apache Parquet
version 1.8.3.
I created a p
Super,
Now it makes sense, I am copying Holden in this email.
Regards,
Gourav
On Tue, 30 Oct 2018, 06:34 lchorbadjiev,
wrote:
> Hi Gourav,
>
> the question in fact is are there any the limitations of Apache Spark
> support for Parquet file format.
>
> The example schema from the dremel paper
Are you using the same parquet version as Spark uses? Are you using a recent
version of Spark? Why don’t you create the file in Spark?
> Am 30.10.2018 um 07:34 schrieb lchorbadjiev :
>
> Hi Gourav,
>
> the question in fact is are there any the limitations of Apache Spark
> support for Parquet f
Hi Gourav,
the question in fact is are there any the limitations of Apache Spark
support for Parquet file format.
The example schema from the dremel paper is something that is supported in
Apache Parquet (using Apache Parquet Java API).
Now I am trying to implement the same schema using Apache S
Open source impl of dremel is parquet !
On Mon, Oct 29, 2018, 8:42 AM Gourav Sengupta
wrote:
> Hi,
>
> why not just use dremel?
>
> Regards,
> Gourav Sengupta
>
> On Mon, Oct 29, 2018 at 1:35 PM lchorbadjiev <
> lubomir.chorbadj...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to reproduce the exa
Hi,
why not just use dremel?
Regards,
Gourav Sengupta
On Mon, Oct 29, 2018 at 1:35 PM lchorbadjiev
wrote:
> Hi,
>
> I'm trying to reproduce the example from dremel paper
> (https://research.google.com/pubs/archive/36632.pdf) in Apache Spark using
> pyspark and I wonder if it is possible at all
Hi,
I'm trying to reproduce the example from dremel paper
(https://research.google.com/pubs/archive/36632.pdf) in Apache Spark using
pyspark and I wonder if it is possible at all?
Trying to follow the paper example as close as possible I created this
document type:
from pyspark.sql.types import