Re: Line Parsing Errors and Skipping

Andries Engelbrecht Tue, 03 Nov 2015 11:54:02 -0800

See DRILL-2424 and DRILL-1131
Incomplete records/files can cause issues, in Drill 1.2 hey have added the 
ability to ignore data files with a .prefix.


Perhaps copy files in over NFS using a . prefix and then rename once copied on 
the DFS.

I had the same issue with Flume data streaming in and incomplete records, not 
been able to test with Drill 1.2. However if I copy an existing file to the 
same directory with a . prefix I can see in the query plan that the hidden file 
is being ignored.

—Andries
 
> On Nov 3, 2015, at 11:07 AM, John Omernik <j...@omernik.com> wrote:
> 
> I am doing some "active" loading of data into json files on MapRFS.
> Basically I have feeds pulling from a message  queue and outputting the
> JSON messages.
> 
> I have a query that is doing aggregations on all the data that seem to work
> 90% of the time.
> 
> The other 10%, I get this error:
> 
> Error: DATA_READ ERROR: Error parsing JSON - Unexpected end-of-input in
> VALUE_STRING
> 
> File: /path/to/file
> Record: someint
> Column: someint
> Fragment someint:someint
> 
> (I replaced the actual record, column, and fragment info obviously)
> 
> 
> When I get this error, I can run the same query again, and all is well.
> 
> My questions are this:
> 
> 1. My "gut" is telling me this is because I have files being written in
> real time with MapR FS using POSIX tools over NFS and when this is
> occuring, it's because the python fh.write() is "in mid stream" when drill
> tries to query the file, thus it's not perfectly formatted.  Does this seem
> feasible?
> 
> 2.  Just waiting a bit fixes things, thus because of how Drill works, i.e.
> it has to read all the data on an aggregate query,  if it was going to fail
> because there was corrupt data permanently written, it would fail every
> time. (I.e. I shouldn't be troubleshooting this because if it's working,
> the problem is resolved at least until the next time I try to read a half
> written json object. Is this accurate?
> 
> 3.  This is always going to be the case with "realtime" data, or is there a
> way to address this?
> 
> 4. Is there a way to address this type of issues by skipping that
> line/record?  I know there was some talk about skipping records in other
> posts/JIRAs, but not sure if this would be taken into account there.
> 
> 5. Am I completely off base and the actual problem is something else?
> 
> John

Re: Line Parsing Errors and Skipping

Reply via email to