Re: REST Structured Steaming Sink

Andrew Melo Wed, 01 Jul 2020 18:41:08 -0700

On Wed, Jul 1, 2020 at 8:13 PM Burak Yavuz <brk...@gmail.com> wrote:
>
> I'm not sure having a built-in sink that allows you to DDOS servers is the 
> best idea either. foreachWriter is typically used for such use cases, not 
> foreachBatch. It's also pretty hard to guarantee exactly-once, rate limiting, 
> etc.


If you control the machines and can run arbitrary code, you can DDOS
whatever you want. What's the difference between this proposal and
writing a UDF that opens 1,000 connections to a target machine?

> Best,
> Burak
>
> On Wed, Jul 1, 2020 at 5:54 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>>
>> I think adding something like this (if it doesn't already exist) could help 
>> make structured streaming easier to use, foreachBatch is not the best API.
>>
>> On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim <kabhwan.opensou...@gmail.com> 
>> wrote:
>>>
>>> I guess the method, query parameter, header, and the payload would be all 
>>> different for almost every use case - that makes it hard to generalize and 
>>> requires implementation to be pretty much complicated to be flexible enough.
>>>
>>> I'm not aware of any custom sink implementing REST so your best bet would 
>>> be simply implementing your own with foreachBatch, but so someone might 
>>> jump in and provide a pointer if there is something in the Spark ecosystem.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com> wrote:
>>>>
>>>> Hi All,
>>>>
>>>>
>>>> We ingest alot of restful APIs into our lake and I'm wondering if it is at 
>>>> all possible to created a rest sink in structured streaming?
>>>>
>>>> For now I'm only focusing on restful services that have an incremental ID 
>>>> so my sink can just poll for new data then ingest.
>>>>
>>>> I can't seem to find a connector that does this and my gut instinct tells 
>>>> me it's probably because it isn't possible due to something completely 
>>>> obvious that I am missing
>>>>
>>>> I know some RESTful API obfuscate the IDs to a hash of strings and that 
>>>> could be a problem but since I'm planning on focusing on just numerical 
>>>> IDs that just get incremented I think I won't be facing that issue
>>>>
>>>>
>>>> Can anyone let me know if this sounds like a daft idea? Will I need 
>>>> something like Kafka or kinesis as a buffer and redundancy or am I 
>>>> overthinking this?
>>>>
>>>>
>>>> I would love to bounce ideas with people who runs structured streaming 
>>>> jobs in production
>>>>
>>>>
>>>> Kind regards
>>>> San
>>>>
>>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: REST Structured Steaming Sink

Reply via email to