Can a DataFrame be saved to s3 directly using Parquet?

Stuart Layton Wed, 25 Mar 2015 12:03:08 -0700

I'm trying to save a dataframe to s3 as a parquet file but I'm getting
Wrong FS errors


>>> df.saveAsParquetFile(parquetFile)
15/03/25 18:56:10 INFO storage.MemoryStore: ensureFreeSpace(46645) called
with curMem=82744, maxMem=278302556
15/03/25 18:56:10 INFO storage.MemoryStore: Block broadcast_5 stored as
values in memory (estimated size 45.6 KB, free 265.3 MB)
15/03/25 18:56:10 INFO storage.MemoryStore: ensureFreeSpace(7078) called
with curMem=129389, maxMem=278302556
15/03/25 18:56:10 INFO storage.MemoryStore: Block broadcast_5_piece0 stored
as bytes in memory (estimated size 6.9 KB, free 265.3 MB)
15/03/25 18:56:10 INFO storage.BlockManagerInfo: Added broadcast_5_piece0
in memory on ip-172-31-1-219.ec2.internal:58280 (size: 6.9 KB, free: 265.4
MB)
15/03/25 18:56:10 INFO storage.BlockManagerMaster: Updated info of block
broadcast_5_piece0
15/03/25 18:56:10 INFO spark.SparkContext: Created broadcast 5 from
textFile at JSONRelation.scala:98
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/spark/python/pyspark/sql/dataframe.py", line 121, in
saveAsParquetFile
    self._jdf.saveAsParquetFile(path)
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File "/root/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line
300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o22.saveAsParquetFile.
: java.lang.IllegalArgumentException: Wrong FS:
s3n://com.my.bucket/spark-testing/, expected: hdfs://
ec2-52-0-159-113.compute-1.amazonaws.com:9000


Is it possible to save a dataframe to s3 directly using parquet?

-- 
Stuart Layton

Can a DataFrame be saved to s3 directly using Parquet?

Reply via email to