Hi Stefán,
Yes, I'm considering this option now (while there is no other better options).
Faced some limitation though. You can not query on directory when
schema between files different.
Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support
schema changes
On Fri, Dec 9, 2016 at
Hi,
Have you considered batching them up into a nicely defined directory
structure and use directory pruning as part of your queries?
I ask because our batch processes does that. Data is arranged into Hour,
Day, Month, Quarter, Years structures (which we then roll-up in different
ways, based on v
Sure... I believe you could CTAS from your json directory into a tmp
parquet directory and then move the resultant files into the final parquet
directory
i.e.
Drill Query: Create table `.mytempparq` as select * from `.mytempjson`
Filesystem command: mv ./mytempparq/* ./myfinalparq
It would be gr
By the way, is it possible to append data to parquet data source?
I'm looking for possibility to update (append to) existing data new
rows so every query execution will have new data rows.
Surely it's possible with plain JSON, but I want more efficient binary
format which will give quicker reads (
Hi John,
Thanks, I tried with directory containing several parquet sub-directories.
It works and looks in Drill like one parquet data source.
Not exactly what I want, but it's good workaround. Thanks again.
On Wed, Dec 7, 2016 at 4:39 PM, John Omernik wrote:
> Alexander -
>
> When I have someth
Hi Stefán,
Yes, thanks, I know about CTAS possibility and it works fine. And much
faster then direct JSON read.
I'm looking for possibility to load batch data from other sources. For
example from Kafka Connect Sink module.
On Wed, Dec 7, 2016 at 4:33 PM, Stefán Baxter wrote:
> Hi Alexander,
>
>
Alexander -
When I have something like this, especially when the output will be
extremely large, I use CTAS into Parquet files. That said, I think you are
more looking at the ETL process for JSON. So, ignoring the CTAS to Parquet
for now, if you have a bunch of JSON files that will be loaded
incr
Hi Alexander,
Drill allows you to both a) query the data directly in json format and b)
convert it to Parqet (have a look at the CTAS function)
Hope that helps,
-Stefán
On Wed, Dec 7, 2016 at 1:08 PM, Alexander Reshetov <
alexander.v.reshe...@gmail.com> wrote:
> Hello,
>
> I want to load batch
Hello,
I want to load batches of unstructured data in Drill. Mostly JSON data.
Is there any batch API or other options to do so?
Thanks.