Hello,
This is my format setting:
"csv": {
"type": "text",
"extensions": [
"csv"
],
"extractHeader": true,
"delimiter": ","
}
I was able to extract the header and get expected results:
select * from mfs.tmp.`abcd.csv`;
+----+----+----+----+
| A | B | C | D |
+----+----+----+----+
| 1 | 2 | 3 | 4 |
| 2 | 3 | 4 | 5 |
| 3 | 4 | 5 | 6 |
+----+----+----+----+
3 rows selected (0.196 seconds)
select A from mfs.tmp.`abcd.csv`;
+----+
| A |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows selected (0.16 seconds)
I am using a MapR cluster with Drill 1.6.0. I had also enabled the new
text
reader.
Note: My initial query failed to extract header, similar to what you
reported. I had to set the "skipFirstLine" option to true, for it to
work.
Strangely, for subsequent queries, it works even after removing /
disabling
the "skipFirstLine" option. This could be a bug, but I'm not able to
reproduce it right now. Will file a JIRA once i have more clarity.
Regards,
Abhishek
On Fri, Apr 15, 2016 at 10:53 AM, Matt <bsg...@gmail.com> wrote:
With files in the local filesystem, and an embedded drill bit from
the
download on drill.apache.org, I can successfully query csv data by
column
name with the extractHeader option on, as in SELECT customer_if FROM
`file`;
But in a MapR cluster (v. 5.1.0.37549.GA) with the data in MapR-FS,
the
extractHeader options does not seem to be taking effect. A plain
"SELECT *"
returns rows with the header as a data row, not in the columns list.
I have verified that exec.storage.enable_new_text_reader is true, and
in
both cases csv storage is defined as:
~~~
"csv": {
"type": "text",
"extensions": [
"csv"
],
"extractHeader": true,
"delimiter": ","
}
~~~
Of course with the csv reader not extracting the columns, an attempt
to
reference columns by name results in:
Error: DATA_READ ERROR: Selected column 'customer_id' must have name
'columns' or must be plain '*'. In trying to diagnose the issue, I
noted
that at times the file header row not being part of the SELECT *
results,
but also not being used to detect column names.
Both cases are Drill v1.6.0, but the MapR installed version has a
different commit than the standalone copy I am using:
MapR:
~~~
+----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version | commit_id |
commit_message
| commit_time | build_email |
build_time |
+----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.6.0 | 2d532bd206d7ae9f3cb703ee7f51ae3764374d43 | MD-850:
Treat the
type of decimal literals as DOUBLE only when
planner.enable_decimal_data_type is true | 31.03.2016 @ 04:47:25 UTC
|
Unknown | 31.03.2016 @ 04:40:54 UTC |
+----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
~~~
Local:
~~~
+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
| version | commit_id |
commit_message | commit_time |
build_email | build_time |
+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
| 1.6.0 | d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb |
[maven-release-plugin] prepare release drill-1.6.0 | 10.03.2016 @
16:34:37
PST | par...@apache.org | 10.03.2016 @ 17:45:29 PST |
+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
~~~