On Wed, Jul 1, 2020 at 6:13 PM Burak Yavuz <brk...@gmail.com> wrote: > I'm not sure having a built-in sink that allows you to DDOS servers is the > best idea either > Do you think it would be used accidentally? If so we could have it with default per server rate limits that people would have to explicitly tune.
> . foreachWriter is typically used for such use cases, not foreachBatch. > It's also pretty hard to guarantee exactly-once, rate limiting, etc. > > Best, > Burak > > On Wed, Jul 1, 2020 at 5:54 PM Holden Karau <hol...@pigscanfly.ca> wrote: > >> I think adding something like this (if it doesn't already exist) could >> help make structured streaming easier to use, foreachBatch is not the best >> API. >> >> On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> >> wrote: >> >>> I guess the method, query parameter, header, and the payload would >>> be all different for almost every use case - that makes it hard to >>> generalize and requires implementation to be pretty much complicated to be >>> flexible enough. >>> >>> I'm not aware of any custom sink implementing REST so your best bet >>> would be simply implementing your own with foreachBatch, but so someone >>> might jump in and provide a pointer if there is something in the Spark >>> ecosystem. >>> >>> Thanks, >>> Jungtaek Lim (HeartSaVioR) >>> >>> On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com> >>> wrote: >>> >>>> Hi All, >>>> >>>> >>>> We ingest alot of restful APIs into our lake and I'm wondering if it is >>>> at all possible to created a rest sink in structured streaming? >>>> >>>> For now I'm only focusing on restful services that have an incremental >>>> ID so my sink can just poll for new data then ingest. >>>> >>>> I can't seem to find a connector that does this and my gut instinct >>>> tells me it's probably because it isn't possible due to something >>>> completely obvious that I am missing >>>> >>>> I know some RESTful API obfuscate the IDs to a hash of strings and that >>>> could be a problem but since I'm planning on focusing on just numerical IDs >>>> that just get incremented I think I won't be facing that issue >>>> >>>> >>>> Can anyone let me know if this sounds like a daft idea? Will I need >>>> something like Kafka or kinesis as a buffer and redundancy or am I >>>> overthinking this? >>>> >>>> >>>> I would love to bounce ideas with people who runs structured streaming >>>> jobs in production >>>> >>>> >>>> Kind regards >>>> San >>>> >>>> >>>> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau