One gotcha with JSON vs parquet is that there's a longstanding bug that
causes errors when trying to read from Parquet files containing 0 rows.
For cases where we're converting from datasets that might be empty, we use
JSON, and for everything else, Parquet.
Message-
> From: Divya Gehlot [mailto:divya.htco...@gmail.com]
> Sent: Tuesday, June 12, 2018 5:25 AM
> To: user@drill.apache.org
> Subject: Re: Which perform better JSON or convert JSON to parquet format ?
>
> [EXTERNAL EMAIL]
>
>
> Hi David,
> How to create the schema
Message-
From: Divya Gehlot [mailto:divya.htco...@gmail.com]
Sent: Tuesday, June 12, 2018 5:25 AM
To: user@drill.apache.org
Subject: Re: Which perform better JSON or convert JSON to parquet format ?
[EXTERNAL EMAIL]
Hi David,
How to create the schema first using parquet library ?
Can you please
ng to query parquet.
>
> -Original Message-
> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
> Sent: Monday, June 11, 2018 4:47 AM
> To: user
> Subject: Re: Which perform better JSON or convert JSON to parquet format ?
>
> [EXTERNAL EMAIL]
>
>
> Yes. Drill is good
json which always ends in index out of bound (server crashing) errors when
trying to query parquet.
-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Monday, June 11, 2018 4:47 AM
To: user
Subject: Re: Which perform better JSON or convert JSON to parquet format ?
Yes. Drill is good at JSON.
But Parquet will be faster during a scan.
Faster may be better. Or other things may be more important.
You have to decide what is important to you. The great virtue of drill is
that you have the choice.
On Mon, Jun 11, 2018 at 11:06 AM Divya Gehlot
wrote:
> Thank
Thanks to all for your opinions !
As Drill has been popularised as complex JSON reader as compare to other
tools in space .
Was wondering does drill works better for JSON rather than parquet.
I am going to play the contrarian here.
Parquet is not *always* faster than JSON.
The (almost unique) case where it is better to leave data as JSON (or
whatever) is when the average number of times that a file is read is equal
to or less than roughly 1.
The point is that to convert read the file
Yes, parquet is always better for multiple reasons. With JSON, we have to read
the whole file
from a single reader thread and have to parse to read individual columns.
Parquet compresses and encodes data on disk. So, we read much less data from
disk.
Drill can read individual columns with in eac
I would suggest converting the JSON files to parquet for better
performance. JSON supports a more free form data model, so that's a
trade-off you need to consider, in my opinion.
On Sun, Jun 10, 2018 at 8:08 PM Divya Gehlot
wrote:
> Hi,
> I am looking for the advise regarding the performance for
Hi,
I am looking for the advise regarding the performance for below :
1. keep the JSON as is
2. Convert the JSON file to parquet files
My JSON files data is not in fixed format and file size varies from 10 KB
to 1 MB.
Appreciate the community users advise on above !
Thanks,
Divya
11 matches
Mail list logo