Re: Line Parsing Errors and Skipping

John Omernik Tue, 03 Nov 2015 12:37:32 -0800

Great feature and this fixes my problem.  All I do is in my python script
when I open a file, it opens with the .prefix. When I "close" it I rename
it without the . prefix. Easy fix. Thanks for the pointer Andries!


John

On Tue, Nov 3, 2015 at 1:52 PM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:

> See DRILL-2424 and DRILL-1131
> Incomplete records/files can cause issues, in Drill 1.2 hey have added the
> ability to ignore data files with a .prefix.
>
> Perhaps copy files in over NFS using a . prefix and then rename once
> copied on the DFS.
>
> I had the same issue with Flume data streaming in and incomplete records,
> not been able to test with Drill 1.2. However if I copy an existing file to
> the same directory with a . prefix I can see in the query plan that the
> hidden file is being ignored.
>
> —Andries
>
> > On Nov 3, 2015, at 11:07 AM, John Omernik <j...@omernik.com> wrote:
> >
> > I am doing some "active" loading of data into json files on MapRFS.
> > Basically I have feeds pulling from a message  queue and outputting the
> > JSON messages.
> >
> > I have a query that is doing aggregations on all the data that seem to
> work
> > 90% of the time.
> >
> > The other 10%, I get this error:
> >
> > Error: DATA_READ ERROR: Error parsing JSON - Unexpected end-of-input in
> > VALUE_STRING
> >
> > File: /path/to/file
> > Record: someint
> > Column: someint
> > Fragment someint:someint
> >
> > (I replaced the actual record, column, and fragment info obviously)
> >
> >
> > When I get this error, I can run the same query again, and all is well.
> >
> > My questions are this:
> >
> > 1. My "gut" is telling me this is because I have files being written in
> > real time with MapR FS using POSIX tools over NFS and when this is
> > occuring, it's because the python fh.write() is "in mid stream" when
> drill
> > tries to query the file, thus it's not perfectly formatted.  Does this
> seem
> > feasible?
> >
> > 2.  Just waiting a bit fixes things, thus because of how Drill works,
> i.e.
> > it has to read all the data on an aggregate query,  if it was going to
> fail
> > because there was corrupt data permanently written, it would fail every
> > time. (I.e. I shouldn't be troubleshooting this because if it's working,
> > the problem is resolved at least until the next time I try to read a half
> > written json object. Is this accurate?
> >
> > 3.  This is always going to be the case with "realtime" data, or is
> there a
> > way to address this?
> >
> > 4. Is there a way to address this type of issues by skipping that
> > line/record?  I know there was some talk about skipping records in other
> > posts/JIRAs, but not sure if this would be taken into account there.
> >
> > 5. Am I completely off base and the actual problem is something else?
> >
> > John
>
>

Re: Line Parsing Errors and Skipping

Reply via email to