RE: append data to already existing table saved in parquet format

2017-07-27 Thread Dan Holmes
Inc. Direct: 770.859.1255 www.revenueanalytics.com -Original Message- From: Divya Gehlot [mailto:divya.htco...@gmail.com] Sent: Thursday, July 27, 2017 1:56 AM To: user@drill.apache.org Subject: Re: append data to already existing table saved in parquet format Hi Paul, Let my try your app

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Divya Gehlot
Hi Paul, Let my try your approach of CTAS and save to partition directory structure . Thanks for the suggestion. Thanks, Divya On 27 July 2017 at 11:57, Paul Rogers wrote: > Hi All, > > Saurabh, you are right. But, since Parquet does not allow appending to > existing files, we have to do the

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Paul Rogers
Hi All, Saurabh, you are right. But, since Parquet does not allow appending to existing files, we have to do the logical equivalent which is to create a new Parquet file. For it to be part of the same “table” it must be part of an existing partition structure as Divya described. The trick here

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Saurabh Mahapatra
But append only means you are adding event record to a table(forget the layout for a while). That means you have to write to the end of a table. If the writes are too many, you have to batch them and then convert them into a column format. This to me sounds like a Kafka workflow where you keep

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Divya Gehlot
Yes Paul I am looking for the insert into partition feature . In this way we just have to create the file for that particular partition when new data comes in or any updation if its required . Else every time when data comes in have run the view and recreate the parquet files for whole data set whi

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Paul Rogers
Hi Divya, Seems that you are asking for an “INSERT INTO” feature (DRILL-3534). The idea would be to create new Parquet files into an existing partition structure. That feature has not yet been started. So, the workarounds provided might help you for now. - Paul > On Jul 26, 2017, at 8:46 AM,

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Saurabh Mahapatra
Does Drill provide that kind of functionality? Theoretically yes. CTAS should work. But your cluster has to be sized. But I would never put something in such a pipeline without adequate testing. And I would always consider a lambda architecture to ensure that if this path were to fail (with Drill o

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Divya Gehlot
The data size is not big for every hour but data size will grow with the time say if I have data for 2 years and data is coming on hourly basis and everytime creating the paruqet table is not the feasible solution . Likewise for hive create the partition and insert the data into partition accordin

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Saurabh Mahapatra
I always recommend against using CTAS as a shortcut for a ETL type large workload. You will need to size your Drill cluster accordingly. Consider using Hive or Spark instead. What are the source file formats? For every hour, what is the size and the number of rows for that data? Are you doing any

Re: append data to already existing table saved in parquet format

2017-07-25 Thread rahul challapalli
I am not aware of any clean way to do this. However if your data is partitioned based on directories, then you can use the below hack which leverages temporary tables [1]. Essentially, you backup your partition to a temp table, then override it by taking the union of new partition data and existing

Re: append data to already existing table saved in parquet format

2017-07-25 Thread Abhishek Girish
Drill doesn't have support for an insert into command. You could try using the CTAS command to write to a specific partition directory, may be? Also look at CTAS auto partitioning [1] [1] https://drill.apache.org/docs/partition-by-clause/ On Tue, Jul 25, 2017 at 10:52 PM, Divya Gehlot wrote: >

append data to already existing table saved in parquet format

2017-07-25 Thread Divya Gehlot
Hi, I am naive to Apache drill. As I have data coming in every hour , when I searched I couldnt find the insert into partition command in Apache drill. How can we insert data to particular partition without rewriting the whole data set ? Appreciate the help. Thanks, Divya