Excluding HDFS .tmp file from multi-file query?

2016-09-21 Thread Robin Moffatt
Hi, I have a stream of data from Flume landing in HDFS in files of a set size. I can query these files individually just fine, and across multiple ones too - except if the wildcard encompasses the *currently open HDFS file that Flume is writing to*. When this happens, Drill understandably barfs. 0

Re: Excluding HDFS .tmp file from multi-file query?

2016-09-21 Thread Andries Engelbrecht
Add a . prefix to the Flume temp files. Drill will ignore the hidden files when you query the directory structure. --Andries > On Sep 21, 2016, at 2:36 PM, Robin Moffatt > wrote: > > Hi, > I have a stream of data from Flume landing in HDFS in files of a set size. > I can query these files ind

Re: Excluding HDFS .tmp file from multi-file query?

2016-09-22 Thread Robin Moffatt
Hi, It still tried to read it, even with a . prefix: 0: jdbc:drill:drillbit=localhost> SELECT count(*) FROM table(`hdfs`.`/user/flume/incoming/twitter/2016/09/22/*`(type => 'json')); Error: DATA_READ ERROR: Failure reading JSON file - Cannot obtain block length for LocatedBlock{BP-478416316-192.1

Re: Excluding HDFS .tmp file from multi-file query?

2016-09-22 Thread Andries Engelbrecht
I noticed if you specifically use * for file matching it will still read hidden files. However if you only point Drill at a directory it will read the directory and sub structure without reading any hidden files. select * from `/dir1/*` - will read hidden files select * from `/dir1` will not re

Re: Excluding HDFS .tmp file from multi-file query?

2016-09-22 Thread Robin Moffatt
Thanks Andries. Good point about organisation of the data too! Robin. On 22 September 2016 at 15:27, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > I noticed if you specifically use * for file matching it will still read > hidden files. However if you only point Drill at a directory i