+1 I like this feature
On Mon, Apr 25, 2016 at 7:52 PM, Amol Kekre <a...@datatorrent.com> wrote: > This is very valuable. I have heard the following feature sets from > customers. > > - Ability to spool to hdfs (or any DFS interface) > - Ability to pick and choose the tuple, i.e. not every tuple may need to be > tracked > - Minimal performance hit > - The current api remains as is > - Ability to get the content based on tuple-id > > Apex should enable this with minimal or no coding from users > > Thks, > Amol > > > On Mon, Apr 25, 2016 at 12:00 AM, Ashwin Chandra Putta < > ashwinchand...@gmail.com> wrote: > > > Hi All, > > > > I have heard of a few use cases where lineage support is asked for. On > > apex, it seems to be an ask for the ability to uniquely track each tuple > as > > it flows through the DAG. It further boils down to being able to track > > every tuple going into each operator and the corresponding tuple going > out > > of the operator. Here are a quick list I put together to describe some > > requirements for lineage support on apex. Please feel free to improve or > > add to it. Also, please respond with ideas on how we can solve this on > the > > apex platform. > > > > When lineage is enabled, > > 1. We should be able to track each tuple as it enters and exits an > > operator. eg: enrichment. > > 2. We should be able to track all the tuples that contributed to a tuple > > that is emitted. eg: dimensions computation. > > 3. We should be able to track all the tuples that contributed to all the > > tuples emitted by the operator. eg: join? > > > > -- > > > > Regards, > > Ashwin. > > > -- Regards, Atri *l'apprenant*