Re: Kinesis Operator Help

Munagala Ramanath Tue, 16 Feb 2016 22:23:26 -0800

By "relaunch option" I'm assuming you mean "launch -originalAppId ...".
Looks like Jim does not want to use that option.
He wants a new launch to automatically detect data from an earlier launch
and, if present, use it.


Ram

On Tue, Feb 16, 2016 at 8:29 PM, Thomas Weise <thomas.we...@gmail.com>
wrote:

> Ram,
>
> The recovery path, when under the application directory, will be
> automatically copied to the new app directory when relaunch option is used.
> This is how the previous instance data is available to the new app.
>
> Thomas
>
> On Tue, Feb 16, 2016 at 5:23 PM, Munagala Ramanath <r...@datatorrent.com>
> wrote:
>
>> Ah, I understand now.
>>
>> The path is set in
>> IdempotentStorageManager.FSIdempotentStorageManager,setup() near line 146:
>> appPath = new Path(context.getValue(DAG.APPLICATION_PATH) +
>> Path.SEPARATOR + recoveryPath);
>>
>> You can try creating a new class that extends FSIdempotentStorageManager
>> and override setup() to use a local property
>> for the appPath and simply duplicate the rest of the code.
>>
>> Ram
>>
>> On Tue, Feb 16, 2016 at 3:59 PM, Jim <jim@facility.supplies> wrote:
>>
>>> Ram,
>>>
>>>
>>>
>>> I am not 100% fluent in the details of the base kinesis operator and how
>>> it interacts with Hadoop (hence my posting); if it would support that, then
>>> yes, you could.
>>>
>>>
>>>
>>> My goal is to make it so one can easily pick up where they left off
>>> reading the Kinesis stream, regardless of if you kill the application and
>>> re-launch it, etc., without needing to go out to the cli to do some
>>> commands (because at some point some operator will forget and then we will
>>> reprocess a bunch of transactions; that would not be good!
>>>
>>>
>>>
>>> Jim
>>>
>>>
>>>
>>> *From:* Munagala Ramanath [mailto:r...@datatorrent.com]
>>> *Sent:* Tuesday, February 16, 2016 5:21 PM
>>> *To:* users@apex.incubator.apache.org
>>> *Subject:* Re: Kinesis Operator Help
>>>
>>>
>>>
>>> Why use the application id ? Could you generate and use a java.util.UUID
>>> for example and save it in HDFS ?
>>>
>>>
>>>
>>> Ram
>>>
>>>
>>>
>>> On Tue, Feb 16, 2016 at 11:40 AM, Jim <jim@facility.supplies> wrote:
>>>
>>> Good morning,
>>>
>>>
>>>
>>> I am new to Apex, Hadoop and Yarn (nothing like tackling something new,
>>> is there?).
>>>
>>>
>>>
>>> I have my first Apex apps working that are edi processors that read new
>>> edi transactions from an Amazon Kinesis stream, look at the data, and
>>> routes the edi data to an appropriate handler for processing (note the
>>> operatorEs pushes the data to ElasticSearch for logging).  Here is a
>>> diagram:
>>>
>>>
>>>
>>>
>>>
>>> Everything launches, and is working fine with the above diagram from the
>>> edi router through the transaction operators.
>>>
>>>
>>>
>>> The final challenge I am having, being new to all of this, is that the
>>> Kinesis operator, by default, stores it’s app id in into
>>> IdempotentStorageManager (aka WindowDataManager) when it is launched, so if
>>> the app it shutdown and restarted this same app id is used by default with
>>> the checkpoint so you don’t reprocess the same records again when the
>>> application is restarted.
>>>
>>>
>>>
>>> You can see this id immediately to the right of the Operations / apps in
>>> gray lettering ‘application_1453741656046_0520’ in the image from the
>>> datatorrent console below:
>>>
>>>
>>>
>>> [image: cid:image004.png@01D168BA.5FE56550]
>>>
>>>
>>>
>>> However, if you kill the application, and re-launch, this id changes,
>>> and it starts reading from the Kinesis stream back from the beginning; and
>>> the only way to restart it so it starts where it left off is using the cli
>>> as follows:
>>>
>>>
>>>
>>> 1.)    Run ‘dtcli’ from the command line.
>>>
>>> 2.)    Run ‘launch -originalAppId “application_1453741656046_0520”
>>> <path to .apa file>’
>>>
>>>
>>>
>>> This will launch the application using the same app id identified in the
>>> console screen above.
>>>
>>>
>>>
>>> I want to make this easier, but need some experts help in tweaking this
>>> so it works.
>>>
>>>
>>>
>>> I am thinking that there should be a way with Kinesis to:
>>>
>>>
>>>
>>> 1.)    Define in the properties, a Kinesis app id string value.
>>>
>>> 2.)    If this value is defined, it will use that, when launching the
>>> application, to check if an Hadoop app id has already been assigned to that
>>> identifier.
>>>
>>> 3.)    If that value is not yet stored in the database, it will launch
>>> the app, creating a new app id, and store the app id under the identifier
>>> key value.
>>>
>>> 4.)    Now if I kill the app, or install new software, it will always
>>> pick up where it left off by using the identifier key value to retrieve and
>>> assign the app id.
>>>
>>>
>>>
>>> Sounds simple, right?  J
>>>
>>>
>>>
>>> Can one of the experts out there help me figure this out as I don’t want
>>> to reprocess already processed edi transactions?
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Jim
>>>
>>>
>>> jim@facility.supplies (414) 760-7711
>>> ------------------------------
>>>
>>> *The information contained in this communication, including any files or
>>> attachments transmitted with it, may contain copyrighted information or
>>> information that is confidential and exempt from disclosure under
>>> applicable laws and regulations, is intended only for the use of the
>>> recipient(s) named above, and may be legally privileged. If the reader of
>>> this message is not the intended recipient, you are hereby notified that
>>> any dissemination, distribution, or copying of this communication, or any
>>> of its contents, files or attachments, is strictly prohibited. If you have
>>> received this communication in error, please return it to the sender
>>> immediately and delete the original message and any copy of it from your
>>> computer system. If you have any questions concerning this message, please
>>> contact the sender. *
>>>
>>>
>>>
>>
>>
>

Re: Kinesis Operator Help

Reply via email to