On Wed, Jul 1, 2020 at 6:13 PM Burak Yavuz <brk...@gmail.com> wrote:

> I'm not sure having a built-in sink that allows you to DDOS servers is the
> best idea either
>
Do you think it would be used accidentally? If so we could have it with
default per server rate limits that people would have to explicitly tune.

> . foreachWriter is typically used for such use cases, not foreachBatch.
> It's also pretty hard to guarantee exactly-once, rate limiting, etc.
>

> Best,
> Burak
>
> On Wed, Jul 1, 2020 at 5:54 PM Holden Karau <hol...@pigscanfly.ca> wrote:
>
>> I think adding something like this (if it doesn't already exist) could
>> help make structured streaming easier to use, foreachBatch is not the best
>> API.
>>
>> On Wed, Jul 1, 2020 at 2:21 PM Jungtaek Lim <kabhwan.opensou...@gmail.com>
>> wrote:
>>
>>> I guess the method, query parameter, header, and the payload would
>>> be all different for almost every use case - that makes it hard to
>>> generalize and requires implementation to be pretty much complicated to be
>>> flexible enough.
>>>
>>> I'm not aware of any custom sink implementing REST so your best bet
>>> would be simply implementing your own with foreachBatch, but so someone
>>> might jump in and provide a pointer if there is something in the Spark
>>> ecosystem.
>>>
>>> Thanks,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>> On Thu, Jul 2, 2020 at 3:21 AM Sam Elamin <hussam.ela...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>>
>>>> We ingest alot of restful APIs into our lake and I'm wondering if it is
>>>> at all possible to created a rest sink in structured streaming?
>>>>
>>>> For now I'm only focusing on restful services that have an incremental
>>>> ID so my sink can just poll for new data then ingest.
>>>>
>>>> I can't seem to find a connector that does this and my gut instinct
>>>> tells me it's probably because it isn't possible due to something
>>>> completely obvious that I am missing
>>>>
>>>> I know some RESTful API obfuscate the IDs to a hash of strings and that
>>>> could be a problem but since I'm planning on focusing on just numerical IDs
>>>> that just get incremented I think I won't be facing that issue
>>>>
>>>>
>>>> Can anyone let me know if this sounds like a daft idea? Will I need
>>>> something like Kafka or kinesis as a buffer and redundancy or am I
>>>> overthinking this?
>>>>
>>>>
>>>> I would love to bounce ideas with people who runs structured streaming
>>>> jobs in production
>>>>
>>>>
>>>> Kind regards
>>>> San
>>>>
>>>>
>>>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Reply via email to