You can do a join between streaming dataset and a static dataset. I would
prefer your first approach. But the problem with this approach is
performance.
Unless you cache the dataset , every time you fire a join query it will
fetch the latest records from the table.
Regards,
Rishitesh Mishra,
Any help on converting json to csv or flattering the json file. Json file has
one struts and multiple arrays.
Thanks
Pk
Sent from my iPhone
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Yes, the fix has been merged at should make it into the 2.3 release.
On Mon, Dec 11, 2017, 5:50 PM Ryan Blue wrote:
> Is anyone currently working on this? I just fixed it in our Spark build
> and can contribute the fix if there isn't already a PR for it.
>
> On Mon, Nov 27,
Hi All,
Could I request a review on this patch on Spark-Kinesis streaming. It has
been sitting there for few months looking for some love. Please help.
The patch proposes resuming Kinesis data from a specified timestamp,
similar to Kafka, and improves kinesis crash recovery avoiding scanning ok
Hi All,
I working on real time reporting project and i have a question about
structured streaming job, that is going to stream a particular table
records and would have to join to an existing table.
Stream > query/join to another DF/DS ---> update the Stream data record.
Now i have a
Is anyone currently working on this? I just fixed it in our Spark build and
can contribute the fix if there isn't already a PR for it.
On Mon, Nov 27, 2017 at 12:59 PM, Jason White
wrote:
> It doesn't look like the insert command has any metrics in it. I don't see
> any
Hi,
After another day trying to get my head around WholeStageCodegenExec
and InputAdapter and CollapseCodegenStages optimization rule I came to
conclusion that it may have something to do with UnsafeRow vs
GenericInternalRow/InternalRow so when a physical operator wants to
_somehow_ participate
At Uber we have observed the same resource efficiency issue with dynamic
allocation. Our workload is migrated from Hive on MR to Hive on Spark. We
saw significant performance improvement (>2X) with our workload. We also
expected big resource savings from this migration because there will be one
Hi everyone,
I'm currently porting a MapReduce Application to Spark (on a YARN cluster), and
I'd like to have your insight regarding to the tuning of numbers of executors.
This application is in fact a template that users can use to launch a variety
of jobs which range from tens to thousands