Re: Joining streaming data with static table data.

2017-12-11 Thread Rishi Mishra
You can do a join between streaming dataset and a static dataset. I would prefer your first approach. But the problem with this approach is performance. Unless you cache the dataset , every time you fire a join query it will fetch the latest records from the table. Regards, Rishitesh Mishra,

Json to csv

2017-12-11 Thread Prabha K
Any help on converting json to csv or flattering the json file. Json file has one struts and multiple arrays. Thanks Pk Sent from my iPhone - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: OutputMetrics empty for DF writes - any hints?

2017-12-11 Thread Jason White
Yes, the fix has been merged at should make it into the 2.3 release. On Mon, Dec 11, 2017, 5:50 PM Ryan Blue wrote: > Is anyone currently working on this? I just fixed it in our Spark build > and can contribute the fix if there isn't already a PR for it. > > On Mon, Nov 27,

[kinesis][streaming] Could I request a review on this PR

2017-12-11 Thread Yash Sharma
Hi All, Could I request a review on this patch on Spark-Kinesis streaming. It has been sitting there for few months looking for some love. Please help. The patch proposes resuming Kinesis data from a specified timestamp, similar to Kafka, and improves kinesis crash recovery avoiding scanning ok

Joining streaming data with static table data.

2017-12-11 Thread satyajit vegesna
Hi All, I working on real time reporting project and i have a question about structured streaming job, that is going to stream a particular table records and would have to join to an existing table. Stream > query/join to another DF/DS ---> update the Stream data record. Now i have a

Re: OutputMetrics empty for DF writes - any hints?

2017-12-11 Thread Ryan Blue
Is anyone currently working on this? I just fixed it in our Spark build and can contribute the fix if there isn't already a PR for it. On Mon, Nov 27, 2017 at 12:59 PM, Jason White wrote: > It doesn't look like the insert command has any metrics in it. I don't see > any

Re: GenerateExec, CodegenSupport and supportCodegen flag off?!

2017-12-11 Thread Jacek Laskowski
Hi, After another day trying to get my head around WholeStageCodegenExec and InputAdapter and CollapseCodegenStages optimization rule I came to conclusion that it may have something to do with UnsafeRow vs GenericInternalRow/InternalRow so when a physical operator wants to _somehow_ participate

Re: feedback about SPARK-22683 needed

2017-12-11 Thread Xuefu Zhang
At Uber we have observed the same resource efficiency issue with dynamic allocation. Our workload is migrated from Hive on MR to Hive on Spark. We saw significant performance improvement (>2X) with our workload. We also expected big resource savings from this migration because there will be one

feedback about SPARK-22683 needed

2017-12-11 Thread Julien Cuquemelle
Hi everyone, I'm currently porting a MapReduce Application to Spark (on a YARN cluster), and I'd like to have your insight regarding to the tuning of numbers of executors. This application is in fact a template that users can use to launch a variety of jobs which range from tens to thousands