Yeah, I do manually delete the files, but it still fails with this error.

> On Feb 19, 2015, at 8:16 PM, Ganelin, Ilya <[email protected]> 
> wrote:
> 
> When writing to hdfs Spark will not overwrite existing files or directories. 
> You must either manually delete these or use Java's Hadoop FileSystem class 
> to remove them.
> 
> 
> 
> Sent with Good (www.good.com)
> 
> 
> -----Original Message-----
> From: Pavel Velikhov [[email protected] 
> <mailto:[email protected]>]
> Sent: Thursday, February 19, 2015 11:32 AM Eastern Standard Time
> To: [email protected]
> Subject: Spark job fails on cluster but works fine on a single machine
> 
> I have a simple Spark job that goes out to Cassandra, runs a pipe and stores 
> results:
> 
> val sc = new SparkContext(conf)
> val rdd = sc.cassandraTable(“keyspace", “table")
>       .map(r => r.getInt(“column") + "\t" + 
> write(get_lemmas(r.getString("tags"))))
>       .pipe("python3 /tmp/scripts_and_models/scripts/run.py")
>       .map(r => convertStr(r) )
>       .coalesce(1,true)
>       .saveAsTextFile("/tmp/pavel/CassandraPipeTest.txt")
>       //.saveToCassandra(“keyspace", “table", SomeColumns(“id”,"data”))
> 
> When run on a single machine, everything is fine if I save to an hdfs file or 
> save to Cassandra.
> When run in cluster neither works:
> 
>  - When saving to file, I get an exception: User class threw exception: 
> Output directory hdfs://hadoop01:54310/tmp/pavel/CassandraPipeTest.txt 
> <hdfs://hadoop01:54310/tmp/pavel/CassandraPipeTest.txt> already exists
>  - When saving to Cassandra, only 4 rows are updated with empty data (I test 
> on a 4-machine Spark cluster)
> 
> Any hints on how to debug this and where the problem could be?
> 
> - I delete the hdfs file before running
> - Would really like the output to hdfs to work, so I can debug
> - Then it would be nice to save to Cassandra
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates. The information transmitted herewith is 
> intended only for use by the individual or entity to which it is addressed.  
> If the reader of this message is not the intended recipient, you are hereby 
> notified that any review, retransmission, dissemination, distribution, 
> copying or other use of, or taking of any action in reliance upon this 
> information is strictly prohibited. If you have received this communication 
> in error, please contact the sender and delete the material from your 
> computer.

Reply via email to