That might be the end goal, but currently I don't have an HDFS ingest mechanism.
We are not currently a Hadoop shop - can you suggest simple approaches for bulk loading data from delimited files into HDFS? > On May 26, 2015, at 8:04 PM, Andries Engelbrecht <aengelbre...@maprtech.com> > wrote: > > Perhaps I’m missing something here. > > Why not create a DFS plug in for HDFS and put the file in HDFS? > > > >> On May 26, 2015, at 4:54 PM, Matt <bsg...@gmail.com> wrote: >> >> New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text >> files need to be on all nodes in a cluster? >> >> Using the dfs config below, I am only able to query if a csv file is on all >> 4 nodes. If the file is only on the local node and not others, I get errors >> in the form of: >> >> ~~~ >> 0: jdbc:drill:zk=es05:2181> select * from root.`customer_reviews_1998.csv`; >> Error: PARSE ERROR: From line 1, column 15 to line 1, column 18: Table >> 'root.customer_reviews_1998.csv' not found >> ~~~ >> >> ~~~ >> { >> "type": "file", >> "enabled": true, >> "connection": "file:///", >> "workspaces": { >> "root": { >> "location": "/localdata/hadoop/stage", >> "writable": false, >> "defaultInputFormat": null >> }, >> ~~~ >> >>> On 25 May 2015, at 20:39, Kristine Hahn wrote: >>> >>> The storage plugin "location" needs to be the full path to the localdata >>> directory. This partial storage plugin definition works for the user named >>> mapr: >>> >>> { >>> "type": "file", >>> "enabled": true, >>> "connection": "file:///", >>> "workspaces": { >>> "root": { >>> "location": "/home/mapr/localdata", >>> "writable": false, >>> "defaultInputFormat": null >>> }, >>> . . . >>> >>> Here's a working query for the data in localdata: >>> >>> 0: jdbc:drill:> SELECT COLUMNS[0] AS Ngram, >>> . . . . . . . > COLUMNS[1] AS Publication_Date, >>> . . . . . . . > COLUMNS[2] AS Frequency >>> . . . . . . . > FROM dfs.root.`mydata.csv` >>> . . . . . . . > WHERE ((columns[0] = 'Zoological Journal of the Linnean') >>> . . . . . . . > AND (columns[2] > 250)) LIMIT 10; >>> >>> An complete example, not yet published on the Drill site, shows in detail >>> the steps involved: >>> http://tshiran.github.io/drill/docs/querying-plain-text-files/#example-of-querying-a-tsv-file >>> >>> >>> Kristine Hahn >>> Sr. Technical Writer >>> 415-497-8107 @krishahn >>> >>> >>>> On Sun, May 24, 2015 at 1:56 PM, Matt <bsg...@gmail.com> wrote: >>>> >>>> I have used a single node install (unzip and run) to query local text / >>>> csv files, but on a 3 node cluster (installed via MapR CE), a query with >>>> local files results in: >>>> >>>> ~~~ >>>> sqlline version 1.1.6 >>>> 0: jdbc:drill:> select * from dfs.`testdata.csv`; >>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17: >>>> Table 'dfs./localdata/testdata.csv' not found >>>> >>>> 0: jdbc:drill:> select * from dfs.`/localdata/testdata.csv`; >>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17: >>>> Table 'dfs./localdata/testdata.csv' not found >>>> ~~~ >>>> >>>> Is there a special config for local file querying? An initial doc search >>>> did not point me to a solution, but I may simply not have found the >>>> relevant sections. >>>> >>>> I have tried modifying the default dfs config to no avail: >>>> >>>> ~~~ >>>> "type": "file", >>>> "enabled": true, >>>> "connection": "file:///", >>>> "workspaces": { >>>> "root": { >>>> "location": "/localdata", >>>> "writable": false, >>>> "defaultInputFormat": null >>>> } >>>> ~~~ >