sharing /reusing RDDs is always useful for many use cases, is this possible via persisting RDD on tachyon?
such as off heap persist a named RDD into a given path (instead of /tmp_spark_tachyon/spark-xxx-xxx-xxx) or saveAsParquetFile on tachyon i tried to save a SchemaRDD on tachyon, val parquetFile = sqlContext.parquetFile("hdfs://test01.zala:8020/user/hive/warehouse/parquet_tables.db/some_table/") parquetFile.saveAsParquetFile("tachyon://test01.zala:19998/parquet_1") but always error, first error message is: 14/08/11 16:19:28 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on test03.zala:37377 (size: 18.7 KB, free: 16.6 GB) 14/08/11 16:20:06 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 3.0 (TID 35, test04.zala): java.io.IOException: FailedToCheckpointException(message:Failed to rename hdfs://test01.zala:8020/tmp/tachyon/workers/1407760000003/31806/730 to hdfs://test01.zala:8020/tmp/tachyon/data/730) tachyon.worker.WorkerClient.addCheckpoint(WorkerClient.java:112) tachyon.client.TachyonFS.addCheckpoint(TachyonFS.java:168) tachyon.client.FileOutStream.close(FileOutStream.java:104) org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:70) org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:103) parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:321) parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:111) parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) org.apache.spark.sql.parquet.InsertIntoParquetTable.org$apache$spark$sql$parquet$InsertIntoParquetTable$$writeShard$1(ParquetTableOperations.scala:259) org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:272) org.apache.spark.sql.parquet.InsertIntoParquetTable$$anonfun$saveAsHadoopFile$1.apply(ParquetTableOperations.scala:272) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:722) hdfs://test01.zala:8020/tmp/tachyon/ already chmod to 777, both owner and group is same as spark/tachyon startup user off-heap persist or saveAs normal text file on tachyon works fine. CDH 5.1.0, spark 1.1.0 snapshot, tachyon 0.6 snapshot -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/share-reuse-off-heap-persisted-tachyon-RDD-in-SparkContext-or-saveAsParquetFile-on-tachyon-in-SQLCont-tp11897.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org