Re: is it ok to make I/O calls in UDF? other words is it a standard practice ?

2018-04-24 Thread Jungtaek Lim
Another thing you may want to be aware is, if the result is not idempotent, your query result is also not idempotent. For fault-tolerance there's a chance for record (row) to be replayed (recomputed). -Jungtaek Lim (HeartSaVioR) 2018년 4월 24일 (화) 오후 2:07, Jörn Franke 님이 작성: > What is your use cas

Problem in persisting file in S3 using Spark: xxx file does not exist Exception

2018-04-24 Thread Marco Mistroni
HI all i am using the following code for persisting data into S3 (aws keys are already stored in the environment variables) dataFrame.coalesce(1).write.format("com.databricks.spark.csv").save(fileName) However, i keep on receiving an exception that the file does not exist here's what comes fro

Re: schema change for structured spark streaming using jsonl files

2018-04-24 Thread Lian Jiang
Thanks for any help! On Mon, Apr 23, 2018 at 11:46 AM, Lian Jiang wrote: > Hi, > > I am using structured spark streaming which reads jsonl files and writes > into parquet files. I am wondering what's the process if jsonl files schema > change. > > Suppose jsonl files are generated in \jsonl fold

Re: [Structured Streaming] Restarting streaming query on exception/termination

2018-04-24 Thread Arun Mahadevan
I guess you can wait for the termination, catch exception and then restart the query in a loop. Something like… while (true) { try { val query = df.writeStream(). … .start() query.awaitTermination() } catch { case e: Streaming