Re: append data to already existing table saved in parquet format

2017-07-26 Thread Divya Gehlot
Hi Paul, Let my try your approach of CTAS and save to partition directory structure . Thanks for the suggestion. Thanks, Divya On 27 July 2017 at 11:57, Paul Rogers wrote: > Hi All, > > Saurabh, you are right. But, since Parquet does not allow appending to > existing files,

Re: regex replace in string

2017-07-26 Thread Divya Gehlot
Hi, I have already set the plugin configuration to extractheader :true . and I followed the below link https://drill.apache.org/docs/lesson-2-run-queries-with-ansi-sql/ SELECT REGEXP_REPLACE(CAST(`Column1` AS VARCHAR(100)), '[,".]', '') AS `Col1` FROM

Re: regex replace in string

2017-07-26 Thread Paul Rogers
Hi Divya, I presume that “sample_data.csv” is your file? The default CSV configuration reads files without headers and puts all columns into a single array called “columns”. Do a SELECT * and you’ll see it. You’ll see an array that contains your data: [“Fred”, “Flintstone”] So, the correct

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Paul Rogers
Hi All, Saurabh, you are right. But, since Parquet does not allow appending to existing files, we have to do the logical equivalent which is to create a new Parquet file. For it to be part of the same “table” it must be part of an existing partition structure as Divya described. The trick

Re: regex replace in string

2017-07-26 Thread Divya Gehlot
The another thing which I observed is when I run below query SELECT REGEXP_REPLACE('"This, col7 data yes."', '[,".]', '') FROM (VALUES(1)) EXPR$0 This col7 data yes Same when I run the csv file it gives me empty result set : SELECT REGEXP_REPLACE(CAST(`Column1` AS VARCHAR(100)), '[,".]', '')

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Saurabh Mahapatra
But append only means you are adding event record to a table(forget the layout for a while). That means you have to write to the end of a table. If the writes are too many, you have to batch them and then convert them into a column format. This to me sounds like a Kafka workflow where you

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Divya Gehlot
Yes Paul I am looking for the insert into partition feature . In this way we just have to create the file for that particular partition when new data comes in or any updation if its required . Else every time when data comes in have run the view and recreate the parquet files for whole data set

Re: regex replace in string

2017-07-26 Thread Divya Gehlot
Hi, Please find attached the sample_data.csv file Pasting the content of the csv file below , in case attachment doesn't reach > Column1,Column2,Column3,Column4,Column5 > colonedata1,coltwodata1,-35.924476,138.5987123, > colonedata2,coltwodata2,-27.4372536,153.0304583,137 >

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Paul Rogers
Hi Divya, Seems that you are asking for an “INSERT INTO” feature (DRILL-3534). The idea would be to create new Parquet files into an existing partition structure. That feature has not yet been started. So, the workarounds provided might help you for now. - Paul > On Jul 26, 2017, at 8:46 AM,

Re: 1.11.0 RC question

2017-07-26 Thread Bob Rudis
Oh I'm an idiot. I'll add the pcap format after dinner and try again. Thx for the quick and helpful response! -boB On Wed, Jul 26, 2017 at 18:03 Parth Chandra wrote: > You might have to add the pcap format in the dfs storage plugin config [1] > > Something like this : > >

Re: 1.11.0 RC question

2017-07-26 Thread Parth Chandra
You might have to add the pcap format in the dfs storage plugin config [1] Something like this : "formats": { "csv": { "type": "text", "extensions": [ "csv" ], "delimiter": "," }, "parquet": { "type": "parquet" }, "json": { "type":

Re: 1.11.0 RC question

2017-07-26 Thread Jinfeng Ni
Hi Bob, Is DRILL-5432 the one you are talking about? I saw it's merged and should have been put in the release candidate. What type of error did you see when you tried to query a PCAP? Also, it may help to provide the commit id of your build, by run the following query: SELECT * from

1.11.0 RC question

2017-07-26 Thread Bob Rudis
I wasn't sure if this belonged on the dev list or not but I was peeking around the JIRA for 1.11.0 RC and noticed that it _looked_ like PCAP support is/was going to be in 1.11.0 but when I did a quick d/l and test of the RC (early yesterday) and tried to query a PCAP it did not work. I'm

Re: regex replace in string

2017-07-26 Thread Khurram Faraaz
regexp_replace function works on that data on Drill 1.11.0, commit id : 4220fb2 {noformat} Data used was, [root@centos-01 community]# cat rgex_replce.csv "This is the column,one " "This is column , two" column3 column4 0: jdbc:drill:schema=dfs.tmp> select * from `rgex_replce.csv`;

RE: drill error connecting to Hbase

2017-07-26 Thread Kunal Khatua
The bundled projects (HBase, Hive) in CDH have their own versions. I'm wondering if that is what is the difference. Drill has been tested with HBase 1.1.1 and Hive 1.2.1 . For higher versions, as long as APIs have not changed, things should be backward compatible. Also, the error message you

Hbase tables create are not displayed in apache drill

2017-07-26 Thread hardik lathigara
HI, I have created hbase setup in apache drill with the reference of https://drill.apache.org/docs/hbase-storage-plugin/ and was trying to query hbase tables(students and clicks) by following https://drill.apache.org/docs/querying-hbase/ but when I try to run commands like (SHOW TABLES, SELECT *

Re: regex replace in string

2017-07-26 Thread Paul Rogers
Hi Divya, We found a couple of issues in CSV files that would lead to the kind of errors you encountered. These issues will be fixed in the upcoming Drill 1.11 release. Sharing a sample CSV file will let us check the issue. Even better, voting is open for the 1.11 release. Please go ahead and

Re: regex replace in string

2017-07-26 Thread Khurram Faraaz
Can you please share your CSV file, the SQL query and the version of Drill that you are on. So someone can take a look and try to reproduce the error that you are seeing. Thanks, Khurram From: Divya Gehlot Sent: Wednesday, July 26,

RE: drill error connecting to Hbase

2017-07-26 Thread Shai Shapira
It is CDH 5.8.2 I believe it is reliable versions, isn't it? Thanks, Shai -Original Message- From: Kunal Khatua [mailto:kkha...@mapr.com] Sent: Monday, July 24, 2017 8:50 AM To: user@drill.apache.org Subject: RE: drill error connecting to Hbase This means that the connectivity with ZK

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Divya Gehlot
The data size is not big for every hour but data size will grow with the time say if I have data for 2 years and data is coming on hourly basis and everytime creating the paruqet table is not the feasible solution . Likewise for hive create the partition and insert the data into partition

regex replace in string

2017-07-26 Thread Divya Gehlot
Hi, I have a CSV file where column values are "This is the column,one " "This is column , two" column3 column4 When I try to regex_replace it throws error org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > IllegalArgumentException: reallocation size must be non-negative

Re: append data to already existing table saved in parquet format

2017-07-26 Thread Saurabh Mahapatra
I always recommend against using CTAS as a shortcut for a ETL type large workload. You will need to size your Drill cluster accordingly. Consider using Hive or Spark instead. What are the source file formats? For every hour, what is the size and the number of rows for that data? Are you doing any

Re: [HANGOUT] Topics for 7/25/17

2017-07-26 Thread yuliya Feldman
Sorry for the late chime in.Just a note - regarding s3 - even after upgrade to hadoop 2.8.x you may need to separately update versions of aws, as one provided with the upgrade is not supporting all the newly added regions. Thanks,Yuliya From: Arina Yelchiyeva

Re: append data to already existing table saved in parquet format

2017-07-26 Thread rahul challapalli
I am not aware of any clean way to do this. However if your data is partitioned based on directories, then you can use the below hack which leverages temporary tables [1]. Essentially, you backup your partition to a temp table, then override it by taking the union of new partition data and