Yeah, I do manually delete the files, but it still fails with this error.
> On Feb 19, 2015, at 8:16 PM, Ganelin, Ilya <[email protected]>
> wrote:
>
> When writing to hdfs Spark will not overwrite existing files or directories.
> You must either manually delete these or use Java's Hadoop FileSystem class
> to remove them.
>
>
>
> Sent with Good (www.good.com)
>
>
> -----Original Message-----
> From: Pavel Velikhov [[email protected]
> <mailto:[email protected]>]
> Sent: Thursday, February 19, 2015 11:32 AM Eastern Standard Time
> To: [email protected]
> Subject: Spark job fails on cluster but works fine on a single machine
>
> I have a simple Spark job that goes out to Cassandra, runs a pipe and stores
> results:
>
> val sc = new SparkContext(conf)
> val rdd = sc.cassandraTable(“keyspace", “table")
> .map(r => r.getInt(“column") + "\t" +
> write(get_lemmas(r.getString("tags"))))
> .pipe("python3 /tmp/scripts_and_models/scripts/run.py")
> .map(r => convertStr(r) )
> .coalesce(1,true)
> .saveAsTextFile("/tmp/pavel/CassandraPipeTest.txt")
> //.saveToCassandra(“keyspace", “table", SomeColumns(“id”,"data”))
>
> When run on a single machine, everything is fine if I save to an hdfs file or
> save to Cassandra.
> When run in cluster neither works:
>
> - When saving to file, I get an exception: User class threw exception:
> Output directory hdfs://hadoop01:54310/tmp/pavel/CassandraPipeTest.txt
> <hdfs://hadoop01:54310/tmp/pavel/CassandraPipeTest.txt> already exists
> - When saving to Cassandra, only 4 rows are updated with empty data (I test
> on a 4-machine Spark cluster)
>
> Any hints on how to debug this and where the problem could be?
>
> - I delete the hdfs file before running
> - Would really like the output to hdfs to work, so I can debug
> - Then it would be nice to save to Cassandra
>
> The information contained in this e-mail is confidential and/or proprietary
> to Capital One and/or its affiliates. The information transmitted herewith is
> intended only for use by the individual or entity to which it is addressed.
> If the reader of this message is not the intended recipient, you are hereby
> notified that any review, retransmission, dissemination, distribution,
> copying or other use of, or taking of any action in reliance upon this
> information is strictly prohibited. If you have received this communication
> in error, please contact the sender and delete the material from your
> computer.