You may have seen this: http://search-hadoop.com/m/q3RTtdSyM52urAyI
> On Aug 23, 2015, at 1:01 AM, lostrain A <donotlikeworkingh...@gmail.com> > wrote: > > Hi, > I'm trying to save a simple dataframe to S3 in ORC format. The code is as > follows: > > >> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) >> import sqlContext.implicits._ >> val df=sc.parallelize(1 to 1000).toDF() >> df.write.format("orc").save("s3://logs/dummy) > > I ran the above code in spark-shell and only the _SUCCESS file was saved > under the directory. > The last part of the spark-shell log said: > >> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished task >> 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal (100/100) > >> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: ResultStage 2 >> (save at <console>:29) finished in 0.834 s > >> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed TaskSet >> 2.0, whose tasks have all completed, from pool > >> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at >> <console>:29, took 0.895912 s > >> 15/08/23 07:38:24 main INFO >> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory: >> /media/ephemeral0/s3/output- > >> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for dummy/_SUCCESS >> is [-44, 29, -128, -39, -113, 0, -78, >> 4, -23, -103, 9, -104, -20, -8, 66, 126] > >> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_**** >> committed. > > Anyone has experienced this before? > Thanks! >