Hi Paul,
Let my try your approach of CTAS and save to partition directory structure
.
Thanks for the suggestion.
Thanks,
Divya
On 27 July 2017 at 11:57, Paul Rogers wrote:
> Hi All,
>
> Saurabh, you are right. But, since Parquet does not allow appending to
> existing files,
Hi,
I have already set the plugin configuration to extractheader :true .
and I followed the below link
https://drill.apache.org/docs/lesson-2-run-queries-with-ansi-sql/
SELECT REGEXP_REPLACE(CAST(`Column1` AS VARCHAR(100)), '[,".]', '') AS
`Col1` FROM
Hi Divya,
I presume that “sample_data.csv” is your file? The default CSV configuration
reads files without headers and puts all columns into a single array called
“columns”. Do a SELECT * and you’ll see it. You’ll see an array that contains
your data:
[“Fred”, “Flintstone”]
So, the correct
Hi All,
Saurabh, you are right. But, since Parquet does not allow appending to existing
files, we have to do the logical equivalent which is to create a new Parquet
file. For it to be part of the same “table” it must be part of an existing
partition structure as Divya described.
The trick
The another thing which I observed is when I run below query
SELECT REGEXP_REPLACE('"This, col7 data yes."', '[,".]', '') FROM
(VALUES(1))
EXPR$0
This col7 data yes
Same when I run the csv file it gives me empty result set :
SELECT REGEXP_REPLACE(CAST(`Column1` AS VARCHAR(100)), '[,".]', '')
But append only means you are adding event record to a table(forget the layout
for a while). That means you have to write to the end of a table. If the writes
are too many, you have to batch them and then convert them into a column
format.
This to me sounds like a Kafka workflow where you
Yes Paul I am looking for the insert into partition feature .
In this way we just have to create the file for that particular partition
when new data comes in or any updation if its required .
Else every time when data comes in have run the view and recreate the
parquet files for whole data set
Hi,
Please find attached the sample_data.csv file
Pasting the content of the csv file below , in case attachment doesn't
reach
> Column1,Column2,Column3,Column4,Column5
> colonedata1,coltwodata1,-35.924476,138.5987123,
> colonedata2,coltwodata2,-27.4372536,153.0304583,137
>
Hi Divya,
Seems that you are asking for an “INSERT INTO” feature (DRILL-3534). The idea
would be to create new Parquet files into an existing partition structure. That
feature has not yet been started. So, the workarounds provided might help you
for now.
- Paul
> On Jul 26, 2017, at 8:46 AM,
Oh I'm an idiot. I'll add the pcap format after dinner and try again.
Thx for the quick and helpful response!
-boB
On Wed, Jul 26, 2017 at 18:03 Parth Chandra wrote:
> You might have to add the pcap format in the dfs storage plugin config [1]
>
> Something like this :
>
>
You might have to add the pcap format in the dfs storage plugin config [1]
Something like this :
"formats": {
"csv": {
"type": "text",
"extensions": [
"csv"
],
"delimiter": ","
},
"parquet": {
"type": "parquet"
},
"json": {
"type":
Hi Bob,
Is DRILL-5432 the one you are talking about? I saw it's merged and should
have been put in the release candidate.
What type of error did you see when you tried to query a PCAP? Also, it may
help to provide the commit id of your build, by run the following query:
SELECT * from
I wasn't sure if this belonged on the dev list or not but I was
peeking around the JIRA for 1.11.0 RC and noticed that it _looked_
like PCAP support is/was going to be in 1.11.0 but when I did a quick
d/l and test of the RC (early yesterday) and tried to query a PCAP it
did not work.
I'm
regexp_replace function works on that data on Drill 1.11.0, commit id : 4220fb2
{noformat}
Data used was,
[root@centos-01 community]# cat rgex_replce.csv
"This is the column,one "
"This is column , two"
column3
column4
0: jdbc:drill:schema=dfs.tmp> select * from `rgex_replce.csv`;
The bundled projects (HBase, Hive) in CDH have their own versions. I'm
wondering if that is what is the difference.
Drill has been tested with HBase 1.1.1 and Hive 1.2.1 . For higher versions, as
long as APIs have not changed, things should be backward compatible.
Also, the error message you
HI,
I have created hbase setup in apache drill with the reference of
https://drill.apache.org/docs/hbase-storage-plugin/ and was trying to query
hbase tables(students and clicks) by following
https://drill.apache.org/docs/querying-hbase/ but when I try to run
commands like (SHOW TABLES, SELECT *
Hi Divya,
We found a couple of issues in CSV files that would lead to the kind of errors
you encountered. These issues will be fixed in the upcoming Drill 1.11 release.
Sharing a sample CSV file will let us check the issue. Even better, voting is
open for the 1.11 release. Please go ahead and
Can you please share your CSV file, the SQL query and the version of Drill that
you are on. So someone can take a look and try to reproduce the error that you
are seeing.
Thanks,
Khurram
From: Divya Gehlot
Sent: Wednesday, July 26,
It is CDH 5.8.2
I believe it is reliable versions, isn't it?
Thanks,
Shai
-Original Message-
From: Kunal Khatua [mailto:kkha...@mapr.com]
Sent: Monday, July 24, 2017 8:50 AM
To: user@drill.apache.org
Subject: RE: drill error connecting to Hbase
This means that the connectivity with ZK
The data size is not big for every hour but data size will grow with the
time say if I have data for 2 years and data is coming on hourly basis and
everytime creating the paruqet table is not the feasible solution .
Likewise for hive create the partition and insert the data into partition
Hi,
I have a CSV file where column values are
"This is the column,one "
"This is column , two"
column3
column4
When I try to regex_replace it throws error
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR:
> IllegalArgumentException: reallocation size must be non-negative
I always recommend against using CTAS as a shortcut for a ETL type large
workload. You will need to size your Drill cluster accordingly. Consider
using Hive or Spark instead.
What are the source file formats? For every hour, what is the size and the
number of rows for that data? Are you doing any
Sorry for the late chime in.Just a note - regarding s3 - even after upgrade to
hadoop 2.8.x you may need to separately update versions of aws, as one provided
with the upgrade is not supporting all the newly added regions.
Thanks,Yuliya
From: Arina Yelchiyeva
I am not aware of any clean way to do this. However if your data is
partitioned based on directories, then you can use the below hack which
leverages temporary tables [1]. Essentially, you backup your partition to a
temp table, then override it by taking the union of new partition data and
24 matches
Mail list logo