Hi,
I'm having problem querying tab-delimited (tsv) files which has quotes.
Drill doesn't seem to recognise quotes in tsv while working fine for csv
files.
For example, given the following files
test.tsv
---
foobar bar
aa bc
---
test.csv
--
foobar,bar
aa,bc
--
I get
I think you might have a problem with your tsv file using spaces
instead of tabs.
CSV file contents:
hello,1,2,3
hello,1,2,3
hello,1,2,3
TSV file contents (actual tab character, not spaces):
hello 1 2 3
hello 1 2 3
hello 1 2 3
0: jdbc:drill:zk=local select * from
I can reproduce the issue but I also have a workaround for it:
*1. When storage plugin for tsv is default:*
tsv: {
type: text,
extensions: [
tsv
],
delimiter: \t
},
select columns[0],columns[1] from `test.tsv`;
+--+-+
| EXPR$0 | EXPR$1 |
There are some attributes that were introduced in Drill 1.0 that are partly
documented (sorry no example):
-
http://drill.apache.org/docs/plugin-configuration-basics/#list-of-attributes-and-definitions
(see formats . . . quote)
-
This is a reasonable hack for some cases, but I'm pretty sure this is going
to break the most common purpose of having quotes at all. If you put the
delimiter (tab) between quotes you are going to have it splitting on those
characters where it shouldn't be. There is also the issue that the quotes