Re: Drill 1.6 on MapR cluster not using extractHeader ?

Matt Mon, 18 Apr 2016 09:18:12 -0700

I found that the dfs storage section for csv file types did not all havethe extractHeader setting in place. Manually putting it in all four ofmy nodes may have resolved the issue.

In my vanilla Hadoop 2.7.0 setup on the same servers, I don't recallhaving to set it on all nodes.


Did I perhaps miss something in the MapR cluster setup?


On 15 Apr 2016, at 14:16, Abhishek Girish wrote:

Hello,

This is my format setting:

    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "extractHeader": true,
      "delimiter": ","
    }

I was able to extract the header and get expected results:

select * from mfs.tmp.`abcd.csv`;

+----+----+----+----+
| A  | B  | C  | D  |
+----+----+----+----+
| 1  | 2  | 3  | 4  |
| 2  | 3  | 4  | 5  |
| 3  | 4  | 5  | 6  |
+----+----+----+----+
3 rows selected (0.196 seconds)

select A from mfs.tmp.`abcd.csv`;

+----+
| A  |
+----+
| 1  |
| 2  |
| 3  |
+----+
3 rows selected (0.16 seconds)

I am using a MapR cluster with Drill 1.6.0. I had also enabled the newtext

reader.

Note: My initial query failed to extract header, similar to what you

reported. I had to set the "skipFirstLine" option to true, for it towork.Strangely, for subsequent queries, it works even after removing /disabling

the "skipFirstLine" option. This could be a bug, but I'm not able to
reproduce it right now. Will file a JIRA once i have more clarity.



Regards,
Abhishek

On Fri, Apr 15, 2016 at 10:53 AM, Matt <bsg...@gmail.com> wrote:

With files in the local filesystem, and an embedded drill bit fromthedownload on drill.apache.org, I can successfully query csv data bycolumnname with the extractHeader option on, as in SELECT customer_if FROM`file`;

But in a MapR cluster (v. 5.1.0.37549.GA) with the data in MapR-FS,theextractHeader options does not seem to be taking effect. A plain"SELECT *"

returns rows with the header as a data row, not in the columns list.

I have verified that exec.storage.enable_new_text_reader is true, andin

both cases csv storage is defined as:

~~~
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "extractHeader": true,
      "delimiter": ","
    }
~~~

Of course with the csv reader not extracting the columns, an attemptto

reference columns by name results in:

Error: DATA_READ ERROR: Selected column 'customer_id' must have name

'columns' or must be plain '*'. In trying to diagnose the issue, Inotedthat at times the file header row not being part of the SELECT *results,

but also not being used to detect column names.

Both cases are Drill v1.6.0, but the MapR installed version has a
different commit than the standalone copy I am using:

MapR:

~~~

+----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version  |                 commit_id                 |
                            commit_message
            |        commit_time         | build_email  |
 build_time         |

+----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+

| 1.6.0 | 2d532bd206d7ae9f3cb703ee7f51ae3764374d43 | MD-850:Treat the

type of decimal literals as DOUBLE only when

planner.enable_decimal_data_type is true | 31.03.2016 @ 04:47:25 UTC|

Unknown      | 31.03.2016 @ 04:40:54 UTC  |

+----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
~~~

Local:

~~~

+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
| version  |                 commit_id                 |
 commit_message                    |        commit_time         |
build_email     |         build_time         |

+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
| 1.6.0    | d51f7fc14bd71d3e711ece0d02cdaa4d4c385eeb  |

[maven-release-plugin] prepare release drill-1.6.0 | 10.03.2016 @16:34:37

PST  | par...@apache.org  | 10.03.2016 @ 17:45:29 PST  |

+----------+-------------------------------------------+-----------------------------------------------------+----------------------------+--------------------+----------------------------+
~~~

Re: Drill 1.6 on MapR cluster not using extractHeader ?

Reply via email to