I can reproduce the issue but I also have a workaround for it:
*1. When storage plugin for "tsv" is default:*
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"delimiter": "\t"
},
> select columns[0],columns[1] from `test.tsv`;
+----------+---------+
| EXPR$0 | EXPR$1 |
+----------+---------+
| foobar | bar |
| aa" "bc | null |
+----------+---------+
2 rows selected (0.114 seconds)
*2. If we add "quote" property to use single quote:*
"tsv": {
"type": "text",
"extensions": [
"tsv"
],
"quote": "'",
"delimiter": "\t"
},
Then it works fine:
> select columns[0],columns[1] from `test.tsv`;
+---------+---------+
| EXPR$0 | EXPR$1 |
+---------+---------+
| foobar | bar |
| "aa" | "bc" |
+---------+---------+
2 rows selected (0.11 seconds)
Thanks,
Hao
On Fri, Jun 26, 2015 at 11:03 AM, Kristine Hahn <[email protected]> wrote:
> I think you might have a problem with your tsv file using spaces
> instead of tabs.
> CSV file contents:
> hello,1,2,3
> hello,1,2,3
> hello,1,2,3
>
> TSV file contents (actual tab character, not spaces):
> hello 1 2 3
> hello 1 2 3
> hello 1 2 3
>
> 0: jdbc:drill:zk=local> select * from
> `/Users/khahn/Downloads/csv_test.csv`;
> +------------------------+
> | columns |
> +------------------------+
> | ["hello","1","2","3"] |
> | ["hello","1","2","3"] |
> | ["hello","1","2","3"] |
> +------------------------+
> 3 rows selected (0.114 seconds)
>
> TSV using tabs
> 0: jdbc:drill:zk=local> select * from
> `/Users/khahn/Downloads/tsv_test.tsv`;
> +------------------------+
> | columns |
> +------------------------+
> | ["hello","1","2","3"] |
> | ["hello","1","2","3"] |
> | ["hello","1","2","3"] |
> +------------------------+
> 3 rows selected (0.122 seconds)
>
> TSV using spaces
>
> 0: jdbc:drill:zk=local> select * from
> `/Users/khahn/Downloads/tsv_test.tsv`;
> +------------------------+
> | columns |
> +------------------------+
> | ["hello 1 2 3"] |
> | ["hello 1 2 3"] |
> | ["hello 1 2 3"] |
> +------------------------+
> 3 rows selected (0.117 seconds)
> Kristine Hahn
> Sr. Technical Writer
> 415-497-8107 @krishahn
>
>
>
> On Fri, Jun 26, 2015 at 10:02 AM, Kristine Hahn <[email protected]>
> wrote:
> > There are some attributes that were introduced in Drill 1.0 that are
> partly
> > documented (sorry no example):
> >
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#list-of-attributes-and-definitions
> > (see "formats" . . . "quote")
> >
> http://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats
> >
> >
> >
> >
> >
> >
> > Kristine Hahn
> > Sr. Technical Writer
> > 415-497-8107 @krishahn
> >
> >
> > On Fri, Jun 26, 2015 at 7:27 AM, Chi-Lang Ngo <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> I'm having problem querying tab-delimited (tsv) files which has quotes.
> >>
> >> Drill doesn't seem to recognise quotes in tsv while working fine for csv
> >> files.
> >> For example, given the following files
> >>
> >> test.tsv
> >> -------
> >> foobar bar
> >> "aa" "bc"
> >> -------
> >>
> >> test.csv
> >> ----------
> >> foobar,bar
> >> "aa","bc"
> >> ----------
> >>
> >> I get these results
> >>
> >> 0: jdbc:drill:zk=local> select columns[0], columns[1] from
> >> dfs.`/test.csv`;
> >>
> >> +---------+---------+
> >>
> >> | EXPR$0 | EXPR$1 |
> >>
> >> +---------+---------+
> >>
> >> | foobar | bar |
> >>
> >> | aa | bc |
> >>
> >> +---------+---------+
> >>
> >> 2 rows selected (0.259 seconds)
> >>
> >> 0: jdbc:drill:zk=local> select columns[0], columns[1] from
> >> dfs.`/test.tsv`;
> >>
> >> +----------+---------+
> >>
> >> | EXPR$0 | EXPR$1 |
> >>
> >> +----------+---------+
> >>
> >> | foobar | bar |
> >>
> >> | aa" "bc | null |
> >>
> >> +----------+---------+
> >>
> >> 2 rows selected (0.122 seconds)
> >>
> >> Any ideas?
> >> CL
> >
> >
>