Re: Query local files on cluster? [Beginner]

Andries Engelbrecht Tue, 26 May 2015 17:05:17 -0700

Perhaps I’m missing something here.

Why not create a DFS plug in for HDFS and put the file in HDFS?




On May 26, 2015, at 4:54 PM, Matt <bsg...@gmail.com> wrote:

> New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text 
> files need to be on all nodes in a cluster?
> 
> Using the dfs config below, I am only able to query if a csv file is on all 4 
> nodes. If the file is only on the local node and not others, I get errors in 
> the form of:
> 
> ~~~
> 0: jdbc:drill:zk=es05:2181> select * from root.`customer_reviews_1998.csv`;
> Error: PARSE ERROR: From line 1, column 15 to line 1, column 18: Table 
> 'root.customer_reviews_1998.csv' not found
> ~~~
> 
> ~~~
> {
>  "type": "file",
>  "enabled": true,
>  "connection": "file:///",
>  "workspaces": {
>    "root": {
>      "location": "/localdata/hadoop/stage",
>      "writable": false,
>      "defaultInputFormat": null
>    },
> ~~~
> 
> On 25 May 2015, at 20:39, Kristine Hahn wrote:
> 
>> The storage plugin "location" needs to be the full path to the localdata
>> directory. This partial storage plugin definition works for the user named
>> mapr:
>> 
>> {
>> "type": "file",
>> "enabled": true,
>> "connection": "file:///",
>> "workspaces": {
>> "root": {
>>   "location": "/home/mapr/localdata",
>>   "writable": false,
>>   "defaultInputFormat": null
>> },
>> . . .
>> 
>> Here's a working query for the data in localdata:
>> 
>> 0: jdbc:drill:> SELECT COLUMNS[0] AS Ngram,
>> . . . . . . . > COLUMNS[1] AS Publication_Date,
>> . . . . . . . > COLUMNS[2] AS Frequency
>> . . . . . . . > FROM dfs.root.`mydata.csv`
>> . . . . . . . > WHERE ((columns[0] = 'Zoological Journal of the Linnean')
>> . . . . . . . > AND (columns[2] > 250)) LIMIT 10;
>> 
>> An complete example, not yet published on the Drill site, shows in detail
>> the steps involved:
>> http://tshiran.github.io/drill/docs/querying-plain-text-files/#example-of-querying-a-tsv-file
>> 
>> 
>> Kristine Hahn
>> Sr. Technical Writer
>> 415-497-8107 @krishahn
>> 
>> 
>> On Sun, May 24, 2015 at 1:56 PM, Matt <bsg...@gmail.com> wrote:
>> 
>>> I have used a single node install (unzip and run) to query local text /
>>> csv files, but on a 3 node cluster (installed via MapR CE), a query with
>>> local files results in:
>>> 
>>> ~~~
>>> sqlline version 1.1.6
>>> 0: jdbc:drill:> select * from dfs.`testdata.csv`;
>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17:
>>> Table 'dfs./localdata/testdata.csv' not found
>>> 
>>> 0: jdbc:drill:> select * from dfs.`/localdata/testdata.csv`;
>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17:
>>> Table 'dfs./localdata/testdata.csv' not found
>>> ~~~
>>> 
>>> Is there a special config for local file querying? An initial doc search
>>> did not point me to a solution, but I may simply not have found the
>>> relevant sections.
>>> 
>>> I have tried modifying the default dfs config to no avail:
>>> 
>>> ~~~
>>> "type": "file",
>>> "enabled": true,
>>> "connection": "file:///",
>>> "workspaces": {
>>> "root": {
>>>   "location": "/localdata",
>>>   "writable": false,
>>>   "defaultInputFormat": null
>>> }
>>> ~~~
>>>

Re: Query local files on cluster? [Beginner]

Reply via email to