Update: I tried surrounding the problematic code with try and catch but that does not do the trick: try { val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext._ val jsonFiles=sqlContext.jsonFile("/requests.loading") } catch { case _: Throwable => // Catching all exceptions and not doing anything with them }
any ideas? Thanks, Daniel On Thu, Nov 20, 2014 at 10:20 AM, Daniel Haviv <danielru...@gmail.com> wrote: > Hi, > I'm loading a bunch of json files and there seems to be problems with > specific files (either schema changes or incomplete files). > I'd like to catch the inconsistent files but I'm not sure how to do it. > > This is the exception I get: > 14/11/20 00:13:49 INFO cluster.YarnClientClusterScheduler: Removed TaskSet > 0.0, whose tasks have all completed, from pool > org.apache.spark.SparkException: Job aborted due to stage failure: Task > 3027 in stage 0.0 failed 4 times, most recent failure: Lost task 3027.3 in > stage 0.0 (TID 3100, HDdata2): > com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: was > expecting closing quote for a string value > at [Source: java.io.StringReader@39a8eab6; line: 1, column: 1805] > > com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1524) > > and this is the code causing it: > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext._ > > val jsonFiles=sqlContext.jsonFile("/requests.loading") > > > How can I do it ? > > Thanks, > Daniel > > >