Well I have one program writing data via Python to MapRFS in a directory
that Drill is reading, so yes, I have two different programs reading and
writing data. What I am looking for here is knowing I may have this
scenario where  a read may occur before a write is complete, can I just
have Drill ignore that record.

John

On Tue, Nov 3, 2015 at 1:18 PM, mark charts <mcha...@yahoo.com.invalid>
wrote:

> Hi.
> I read your dilemma. Would a trap in program to handle this ERROR or
> Exception work for you in this case and address it by skip around the
> trouble? My guess is you have a timing condition gone astray somewhere and
> you need to assure all states are timed correctly.
> But what do I know. Good luck.
>
> Mark Charts
>
>
>      On Tuesday, November 3, 2015 2:08 PM, John Omernik <j...@omernik.com>
> wrote:
>
>
>  I am doing some "active" loading of data into json files on MapRFS.
> Basically I have feeds pulling from a message  queue and outputting the
> JSON messages.
>
> I have a query that is doing aggregations on all the data that seem to work
> 90% of the time.
>
> The other 10%, I get this error:
>
> Error: DATA_READ ERROR: Error parsing JSON - Unexpected end-of-input in
> VALUE_STRING
>
> File: /path/to/file
> Record: someint
> Column: someint
> Fragment someint:someint
>
> (I replaced the actual record, column, and fragment info obviously)
>
>
> When I get this error, I can run the same query again, and all is well.
>
> My questions are this:
>
> 1. My "gut" is telling me this is because I have files being written in
> real time with MapR FS using POSIX tools over NFS and when this is
> occuring, it's because the python fh.write() is "in mid stream" when drill
> tries to query the file, thus it's not perfectly formatted.  Does this seem
> feasible?
>
> 2.  Just waiting a bit fixes things, thus because of how Drill works, i.e.
> it has to read all the data on an aggregate query,  if it was going to fail
> because there was corrupt data permanently written, it would fail every
> time. (I.e. I shouldn't be troubleshooting this because if it's working,
> the problem is resolved at least until the next time I try to read a half
> written json object. Is this accurate?
>
> 3.  This is always going to be the case with "realtime" data, or is there a
> way to address this?
>
> 4. Is there a way to address this type of issues by skipping that
> line/record?  I know there was some talk about skipping records in other
> posts/JIRAs, but not sure if this would be taken into account there.
>
> 5. Am I completely off base and the actual problem is something else?
>
> John
>
>
>
>

Reply via email to