Well I have one program writing data via Python to MapRFS in a directory that Drill is reading, so yes, I have two different programs reading and writing data. What I am looking for here is knowing I may have this scenario where a read may occur before a write is complete, can I just have Drill ignore that record.
John On Tue, Nov 3, 2015 at 1:18 PM, mark charts <mcha...@yahoo.com.invalid> wrote: > Hi. > I read your dilemma. Would a trap in program to handle this ERROR or > Exception work for you in this case and address it by skip around the > trouble? My guess is you have a timing condition gone astray somewhere and > you need to assure all states are timed correctly. > But what do I know. Good luck. > > Mark Charts > > > On Tuesday, November 3, 2015 2:08 PM, John Omernik <j...@omernik.com> > wrote: > > > I am doing some "active" loading of data into json files on MapRFS. > Basically I have feeds pulling from a message queue and outputting the > JSON messages. > > I have a query that is doing aggregations on all the data that seem to work > 90% of the time. > > The other 10%, I get this error: > > Error: DATA_READ ERROR: Error parsing JSON - Unexpected end-of-input in > VALUE_STRING > > File: /path/to/file > Record: someint > Column: someint > Fragment someint:someint > > (I replaced the actual record, column, and fragment info obviously) > > > When I get this error, I can run the same query again, and all is well. > > My questions are this: > > 1. My "gut" is telling me this is because I have files being written in > real time with MapR FS using POSIX tools over NFS and when this is > occuring, it's because the python fh.write() is "in mid stream" when drill > tries to query the file, thus it's not perfectly formatted. Does this seem > feasible? > > 2. Just waiting a bit fixes things, thus because of how Drill works, i.e. > it has to read all the data on an aggregate query, if it was going to fail > because there was corrupt data permanently written, it would fail every > time. (I.e. I shouldn't be troubleshooting this because if it's working, > the problem is resolved at least until the next time I try to read a half > written json object. Is this accurate? > > 3. This is always going to be the case with "realtime" data, or is there a > way to address this? > > 4. Is there a way to address this type of issues by skipping that > line/record? I know there was some talk about skipping records in other > posts/JIRAs, but not sure if this would be taken into account there. > > 5. Am I completely off base and the actual problem is something else? > > John > > > >