You can use the HDFS shell hadoop fs -put To copy from local file system to HDFS
For more robust mechanisms from remote systems you can look at using NFS, MapR has a really robust NFS integration and you can use it with the community edition. On May 26, 2015, at 5:11 PM, Matt <bsg...@gmail.com> wrote: > > That might be the end goal, but currently I don't have an HDFS ingest > mechanism. > > We are not currently a Hadoop shop - can you suggest simple approaches for > bulk loading data from delimited files into HDFS? > > > > >> On May 26, 2015, at 8:04 PM, Andries Engelbrecht <aengelbre...@maprtech.com> >> wrote: >> >> Perhaps I’m missing something here. >> >> Why not create a DFS plug in for HDFS and put the file in HDFS? >> >> >> >>> On May 26, 2015, at 4:54 PM, Matt <bsg...@gmail.com> wrote: >>> >>> New installation with Hadoop 2.7 and Drill 1.0 on 4 nodes, it appears text >>> files need to be on all nodes in a cluster? >>> >>> Using the dfs config below, I am only able to query if a csv file is on all >>> 4 nodes. If the file is only on the local node and not others, I get errors >>> in the form of: >>> >>> ~~~ >>> 0: jdbc:drill:zk=es05:2181> select * from root.`customer_reviews_1998.csv`; >>> Error: PARSE ERROR: From line 1, column 15 to line 1, column 18: Table >>> 'root.customer_reviews_1998.csv' not found >>> ~~~ >>> >>> ~~~ >>> { >>> "type": "file", >>> "enabled": true, >>> "connection": "file:///", >>> "workspaces": { >>> "root": { >>> "location": "/localdata/hadoop/stage", >>> "writable": false, >>> "defaultInputFormat": null >>> }, >>> ~~~ >>> >>>> On 25 May 2015, at 20:39, Kristine Hahn wrote: >>>> >>>> The storage plugin "location" needs to be the full path to the localdata >>>> directory. This partial storage plugin definition works for the user named >>>> mapr: >>>> >>>> { >>>> "type": "file", >>>> "enabled": true, >>>> "connection": "file:///", >>>> "workspaces": { >>>> "root": { >>>> "location": "/home/mapr/localdata", >>>> "writable": false, >>>> "defaultInputFormat": null >>>> }, >>>> . . . >>>> >>>> Here's a working query for the data in localdata: >>>> >>>> 0: jdbc:drill:> SELECT COLUMNS[0] AS Ngram, >>>> . . . . . . . > COLUMNS[1] AS Publication_Date, >>>> . . . . . . . > COLUMNS[2] AS Frequency >>>> . . . . . . . > FROM dfs.root.`mydata.csv` >>>> . . . . . . . > WHERE ((columns[0] = 'Zoological Journal of the Linnean') >>>> . . . . . . . > AND (columns[2] > 250)) LIMIT 10; >>>> >>>> An complete example, not yet published on the Drill site, shows in detail >>>> the steps involved: >>>> http://tshiran.github.io/drill/docs/querying-plain-text-files/#example-of-querying-a-tsv-file >>>> >>>> >>>> Kristine Hahn >>>> Sr. Technical Writer >>>> 415-497-8107 @krishahn >>>> >>>> >>>>> On Sun, May 24, 2015 at 1:56 PM, Matt <bsg...@gmail.com> wrote: >>>>> >>>>> I have used a single node install (unzip and run) to query local text / >>>>> csv files, but on a 3 node cluster (installed via MapR CE), a query with >>>>> local files results in: >>>>> >>>>> ~~~ >>>>> sqlline version 1.1.6 >>>>> 0: jdbc:drill:> select * from dfs.`testdata.csv`; >>>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17: >>>>> Table 'dfs./localdata/testdata.csv' not found >>>>> >>>>> 0: jdbc:drill:> select * from dfs.`/localdata/testdata.csv`; >>>>> Query failed: PARSE ERROR: From line 1, column 15 to line 1, column 17: >>>>> Table 'dfs./localdata/testdata.csv' not found >>>>> ~~~ >>>>> >>>>> Is there a special config for local file querying? An initial doc search >>>>> did not point me to a solution, but I may simply not have found the >>>>> relevant sections. >>>>> >>>>> I have tried modifying the default dfs config to no avail: >>>>> >>>>> ~~~ >>>>> "type": "file", >>>>> "enabled": true, >>>>> "connection": "file:///", >>>>> "workspaces": { >>>>> "root": { >>>>> "location": "/localdata", >>>>> "writable": false, >>>>> "defaultInputFormat": null >>>>> } >>>>> ~~~ >>