So, apparently `wholeTextFiles` runs the job again, passing null as argument list, which in turn blows up my argument parsing mechanics. I never thought I had to check for null again in a pure Scala environment ;)
On 26.10.2014, at 11:57, Marius Soutier <mps....@gmail.com> wrote: > I tried that already, same exception. I also tried using an accumulator to > collect all filenames. The filename is not the problem. > > Even this crashes with the same exception: > > sc.parallelize(files.value).map { fileName => > println(s"Scanning $fileName") > try { > println(s"Scanning $fileName") > sc.textFile(fileName).take(1) > s"Successfully scanned $fileName" > } catch { > case t: Throwable => s"Failed to process $fileName, reason > ${t.getStackTrace.head}" > } > } > .saveAsTextFile(output) > > The output file contains “Failed to process…" for each file. > > > On 26.10.2014, at 00:10, Buttler, David <buttl...@llnl.gov> wrote: > >> This sounds like expected behavior to me. The foreach call should be >> distributed on the workers. perhaps you want to use map instead, and then >> collect the failed file names locally, or save the whole thing out to a file >> ________________________________________ >> From: Marius Soutier [mps....@gmail.com] >> Sent: Friday, October 24, 2014 6:35 AM >> To: user@spark.apache.org >> Subject: scala.collection.mutable.ArrayOps$ofRef$.length$extension since >> Spark 1.1.0 >> >> Hi, >> >> I’m running a job whose simple task it is to find files that cannot be read >> (sometimes our gz files are corrupted). >> >> With 1.0.x, this worked perfectly. Since 1.1.0 however, I’m getting an >> exception: >> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114) >> >> sc.wholeTextFiles(input) >> .foreach { case (fileName, _) => >> try { >> println(s"Scanning $fileName") >> sc.textFile(fileName).take(1) >> println(s"Successfully scanned $fileName") >> } catch { >> case t: Throwable => println(s"Failed to process $fileName, reason >> ${t.getStackTrace.head}") >> } >> } >> >> >> Also since 1.1.0, the printlns are no longer visible on the console, only in >> the Spark UI worker output. >> >> >> Thanks for any help >> - Marius >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org