Ya Also I think I need to enable the checkpointing and rather then building
the lineage DAG need to store the RDD data into HDFS.

On 23 September 2015 at 01:04, Adrian Tanase <atan...@adobe.com> wrote:

> btw I re-read the docs and I want to clarify that reliable receiver + WAL
> gives you at least once, not exactly once semantics.
>
> Sent from my iPhone
>
> On 21 Sep 2015, at 21:50, Adrian Tanase <atan...@adobe.com> wrote:
>
> I'm wondering, isn't this the canonical use case for WAL + reliable
> receiver?
>
> As far as I know you can tune Mqtt server to wait for ack on messages (qos
> level 2?).
> With some support from the client libray you could achieve exactly once
> semantics on the read side, if you ack message only after writing it to
> WAL, correct?
>
> -adrian
>
> Sent from my iPhone
>
> On 21 Sep 2015, at 12:35, Petr Novak <oss.mli...@gmail.com> wrote:
>
> In short there is no direct support for it in Spark AFAIK. You will either
> manage it in MQTT or have to add another layer of indirection - either
> in-memory based (observable streams, in-mem db) or disk based (Kafka, hdfs
> files, db) which will keep you unprocessed events.
>
> Now realizing, there is support for backpressure in v1.5.0 but I don't
> know if it could be exploited aka I don't know if it is possible to
> decouple event reading into memory and actual processing code in Spark
> which could be swapped on the fly. Probably not without some custom built
> facility for it.
>
> Petr
>
> On Mon, Sep 21, 2015 at 11:26 AM, Petr Novak <oss.mli...@gmail.com> wrote:
>
>> I should read my posts at least once to avoid so many typos. Hopefully
>> you are brave enough to read through.
>>
>> Petr
>>
>> On Mon, Sep 21, 2015 at 11:23 AM, Petr Novak <oss.mli...@gmail.com>
>> wrote:
>>
>>> I think you would have to persist events somehow if you don't want to
>>> miss them. I don't see any other option there. Either in MQTT if it is
>>> supported there or routing them through Kafka.
>>>
>>> There is WriteAheadLog in Spark but you would have decouple stream MQTT
>>> reading and processing into 2 separate job so that you could upgrade the
>>> processing one assuming the reading one would be stable (without changes)
>>> across versions. But it is problematic because there is no easy way how to
>>> share DStreams between jobs - you would have develop your own facility for
>>> it.
>>>
>>> Alternatively the reading job could could save MQTT event in its the
>>> most raw form into files - to limit need to change code - and then the
>>> processing job would work on top of it using Spark streaming based on
>>> files. I this is inefficient and can get quite complex if you would like to
>>> make it reliable.
>>>
>>> Basically either MQTT supports prsistence (which I don't know) or there
>>> is Kafka for these use case.
>>>
>>> Another option would be I think to place observable streams in between
>>> MQTT and Spark streaming with bakcpressure as far as you could perform
>>> upgrade till buffers fills up.
>>>
>>> I'm sorry that it is not thought out well from my side, it is just a
>>> brainstorm but it might lead you somewhere.
>>>
>>> Regards,
>>> Petr
>>>
>>> On Mon, Sep 21, 2015 at 10:09 AM, Jeetendra Gangele <
>>> gangele...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have an spark streaming application with batch (10 ms) which is
>>>> reading the MQTT channel and dumping the data from MQTT to HDFS.
>>>>
>>>> So suppose if I have to deploy new application jar(with changes in
>>>> spark streaming application) what is the best way to deploy, currently I am
>>>> doing as below
>>>>
>>>> 1.killing the running streaming app using yarn application -kill ID
>>>> 2. and then starting the application again
>>>>
>>>> Problem with above approach is since we are not persisting the events
>>>> in MQTT we will miss the events for the period of deploy.
>>>>
>>>> how to handle this case?
>>>>
>>>> regards
>>>> jeeetndra
>>>>
>>>

Reply via email to