On Wed, Jul 1, 2020 at 8:13 PM Burak Yavuz <brk...@gmail.com> wrote: > > I'm not sure having a built-in sink that allows you to DDOS servers is the > best idea either. foreachWriter is typically used for such use cases, not > foreachBatch. It's also pretty hard to guarantee exactly-once, rate limiting, > etc.
If you control the machines and can run arbitrary code, you can DDOS whatever you want. What's the difference between this proposal and writing a UDF that opens 1,000 connections to a target machine? > Best, > Burak > > On Wed, Jul 1, 2020 at 5:54 PM Holden Karau <hol...@pigscanfly.ca> wrote: >> >> I think adding something like this (if it doesn't already exist) could help >> make structured streaming easier to use, foreachBatch is not the best API. >> >> On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> >> wrote: >>> >>> I guess the method, query parameter, header, and the payload would be all >>> different for almost every use case - that makes it hard to generalize and >>> requires implementation to be pretty much complicated to be flexible enough. >>> >>> I'm not aware of any custom sink implementing REST so your best bet would >>> be simply implementing your own with foreachBatch, but so someone might >>> jump in and provide a pointer if there is something in the Spark ecosystem. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com> wrote: >>>> >>>> Hi All, >>>> >>>> >>>> We ingest alot of restful APIs into our lake and I'm wondering if it is at >>>> all possible to created a rest sink in structured streaming? >>>> >>>> For now I'm only focusing on restful services that have an incremental ID >>>> so my sink can just poll for new data then ingest. >>>> >>>> I can't seem to find a connector that does this and my gut instinct tells >>>> me it's probably because it isn't possible due to something completely >>>> obvious that I am missing >>>> >>>> I know some RESTful API obfuscate the IDs to a hash of strings and that >>>> could be a problem but since I'm planning on focusing on just numerical >>>> IDs that just get incremented I think I won't be facing that issue >>>> >>>> >>>> Can anyone let me know if this sounds like a daft idea? Will I need >>>> something like Kafka or kinesis as a buffer and redundancy or am I >>>> overthinking this? >>>> >>>> >>>> I would love to bounce ideas with people who runs structured streaming >>>> jobs in production >>>> >>>> >>>> Kind regards >>>> San >>>> >>>> >> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org