Re: Query local files on cluster? [Beginner]

Andries Engelbrecht Tue, 26 May 2015 17:30:01 -0700

You can use the HDFS shell
hadoop fs -put

To copy from local file system to HDFS



For more robust mechanisms from remote systems you can look at using NFS, MapR 
has a really robust NFS integration and you can use it with the community 
edition.




On May 26, 2015, at 5:11 PM, Matt <bsg...@gmail.com> wrote:

> 
> That might be the end goal, but currently I don't have an HDFS ingest 
> mechanism. 
> 
> We are not currently a Hadoop shop - can you suggest simple approaches for 
> bulk loading data from delimited files into HDFS?
> 
> 
> 
> 
>> On May 26, 2015, at 8:04 PM, Andries Engelbrecht <aengelbre...@maprtech.com> 
>> wrote:
>> 
>> Perhaps I’m missing something here.
>> 
>> Why not create a DFS plug in for HDFS and put the file in HDFS?
>> 
>> 
>> 
>>> On May 26, 2015, at 4:54 PM, Matt <bsg...@gmail.com> wrote:
>>> 
>>> New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text 
>>> files need to be on all nodes in a cluster?
>>> 
>>> Using the dfs config below, I am only able to query if a csv file is on all 
>>> 4 nodes. If the file is only on the local node and not others, I get errors 
>>> in the form of:
>>> 
>>> ~~~
>>> 0: jdbc:drill:zk=es05:2181> select * from root.`customer_reviews_1998.csv`;
>>> Error: PARSE ERROR: From line 1, column 15 to line 1, column 18: Table 
>>> 'root.customer_reviews_1998.csv' not found
>>> ~~~
>>> 
>>> ~~~
>>> {
>>> "type": "file",
>>> "enabled": true,
>>> "connection": "file:///",
>>> "workspaces": {
>>>  "root": {
>>>    "location": "/localdata/hadoop/stage",
>>>    "writable": false,
>>>    "defaultInputFormat": null
>>>  },
>>> ~~~
>>> 
>>>> On 25 May 2015, at 20:39, Kristine Hahn wrote:
>>>> 
>>>> The storage plugin "location" needs to be the full path to the localdata
>>>> directory. This partial storage plugin definition works for the user named
>>>> mapr:
>>>> 
>>>> {
>>>> "type": "file",
>>>> "enabled": true,
>>>> "connection": "file:///",
>>>> "workspaces": {
>>>> "root": {
>>>> "location": "/home/mapr/localdata",
>>>> "writable": false,
>>>> "defaultInputFormat": null
>>>> },
>>>> . . .
>>>> 
>>>> Here's a working query for the data in localdata:
>>>> 
>>>> 0: jdbc:drill:> SELECT COLUMNS[0] AS Ngram,
>>>> . . . . . . . > COLUMNS[1] AS Publication_Date,
>>>> . . . . . . . > COLUMNS[2] AS Frequency
>>>> . . . . . . . > FROM dfs.root.`mydata.csv`
>>>> . . . . . . . > WHERE ((columns[0] = 'Zoological Journal of the Linnean')
>>>> . . . . . . . > AND (columns[2] > 250)) LIMIT 10;
>>>> 
>>>> An complete example, not yet published on the Drill site, shows in detail
>>>> the steps involved:
>>>> http://tshiran.github.io/drill/docs/querying-plain-text-files/#example-of-querying-a-tsv-file
>>>> 
>>>> 
>>>> Kristine Hahn
>>>> Sr. Technical Writer
>>>> 415-497-8107 @krishahn
>>>> 
>>>> 
>>>>> On Sun, May 24, 2015 at 1:56 PM, Matt <bsg...@gmail.com> wrote:
>>>>> 
>>>>> I have used a single node install (unzip and run) to query local text /
>>>>> csv files, but on a 3 node cluster (installed via MapR CE), a query with
>>>>> local files results in:
>>>>> 
>>>>> ~~~
>>>>> sqlline version 1.1.6
>>>>> 0: jdbc:drill:> select * from dfs.`testdata.csv`;
>>>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17:
>>>>> Table 'dfs./localdata/testdata.csv' not found
>>>>> 
>>>>> 0: jdbc:drill:> select * from dfs.`/localdata/testdata.csv`;
>>>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17:
>>>>> Table 'dfs./localdata/testdata.csv' not found
>>>>> ~~~
>>>>> 
>>>>> Is there a special config for local file querying? An initial doc search
>>>>> did not point me to a solution, but I may simply not have found the
>>>>> relevant sections.
>>>>> 
>>>>> I have tried modifying the default dfs config to no avail:
>>>>> 
>>>>> ~~~
>>>>> "type": "file",
>>>>> "enabled": true,
>>>>> "connection": "file:///",
>>>>> "workspaces": {
>>>>> "root": {
>>>>> "location": "/localdata",
>>>>> "writable": false,
>>>>> "defaultInputFormat": null
>>>>> }
>>>>> ~~~
>>

Re: Query local files on cluster? [Beginner]

Reply via email to