See DRILL-2424 and DRILL-1131 Incomplete records/files can cause issues, in Drill 1.2 hey have added the ability to ignore data files with a .prefix.
Perhaps copy files in over NFS using a . prefix and then rename once copied on the DFS. I had the same issue with Flume data streaming in and incomplete records, not been able to test with Drill 1.2. However if I copy an existing file to the same directory with a . prefix I can see in the query plan that the hidden file is being ignored. —Andries > On Nov 3, 2015, at 11:07 AM, John Omernik <j...@omernik.com> wrote: > > I am doing some "active" loading of data into json files on MapRFS. > Basically I have feeds pulling from a message queue and outputting the > JSON messages. > > I have a query that is doing aggregations on all the data that seem to work > 90% of the time. > > The other 10%, I get this error: > > Error: DATA_READ ERROR: Error parsing JSON - Unexpected end-of-input in > VALUE_STRING > > File: /path/to/file > Record: someint > Column: someint > Fragment someint:someint > > (I replaced the actual record, column, and fragment info obviously) > > > When I get this error, I can run the same query again, and all is well. > > My questions are this: > > 1. My "gut" is telling me this is because I have files being written in > real time with MapR FS using POSIX tools over NFS and when this is > occuring, it's because the python fh.write() is "in mid stream" when drill > tries to query the file, thus it's not perfectly formatted. Does this seem > feasible? > > 2. Just waiting a bit fixes things, thus because of how Drill works, i.e. > it has to read all the data on an aggregate query, if it was going to fail > because there was corrupt data permanently written, it would fail every > time. (I.e. I shouldn't be troubleshooting this because if it's working, > the problem is resolved at least until the next time I try to read a half > written json object. Is this accurate? > > 3. This is always going to be the case with "realtime" data, or is there a > way to address this? > > 4. Is there a way to address this type of issues by skipping that > line/record? I know there was some talk about skipping records in other > posts/JIRAs, but not sure if this would be taken into account there. > > 5. Am I completely off base and the actual problem is something else? > > John