Hi Jan, Is the error because a past run of the job has already written to the location?
In that case you can add more granularity with 'time' along with year and month. That should give you a distinct path for every run. Let us know if it helps or if i missed anything. Goodluck - Thanks, via mobile, excuse brevity. On Dec 22, 2015 2:31 PM, "Jan Holmberg" <jan.holmb...@perigeum.fi> wrote: > Hi, > I'm stuck with writing partitioned data to hdfs. Example below ends up > with 'already exists' -error. > > I'm wondering how to handle streaming use case. > > What is the intended way to write streaming data to hdfs? What am I > missing? > > cheers, > -jan > > > import com.databricks.spark.avro._ > > import org.apache.spark.sql.SQLContext > > val sqlContext = new SQLContext(sc) > > import sqlContext.implicits._ > > val df = Seq( > (2012, 8, "Batman", 9.8), > (2012, 8, "Hero", 8.7), > (2012, 7, "Robot", 5.5), > (2011, 7, "Git", 2.0)).toDF("year", "month", "title", "rating") > > df.write.partitionBy("year", "month").avro("/tmp/data") > > val df2 = Seq( > (2012, 10, "Batman", 9.8), > (2012, 10, "Hero", 8.7), > (2012, 9, "Robot", 5.5), > (2011, 9, "Git", 2.0)).toDF("year", "month", "title", "rating") > > df2.write.partitionBy("year", "month").avro("/tmp/data") > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >