+1. This is a really good starting point to cleanup malhar. On Wed, Jul 13, 2016 at 3:06 AM, David Yan <da...@datatorrent.com> wrote:
> Hi Lakshmi, > > Thanks for volunteering. > > I think Pramod's suggestion of putting the operators into 3 buckets and > Siyuan's suggestion of starting a shared Google Sheet that tracks > individual operators are both good, with the exception that lib/streamquery > is one unit and we probably do not need to look at individual operators > under it. > > If we don't have any objection in the community, let's start the process. > > David > > On Tue, Jul 12, 2016 at 2:24 PM, Lakshmi Velineni <laks...@datatorrent.com > > wrote: > >> I am interested to work on this. >> >> Regards, >> Lakshmi prasanna >> >> On Tue, Jul 12, 2016 at 1:55 PM, hsy...@gmail.com <hsy...@gmail.com> >> wrote: >> >> > Why not have a shared google sheet with a list of operators and options >> > that we want to do with it. >> > I think it's case by case. >> > But retire unused or obsolete operators is important and we should do it >> > sooner rather than later. >> > >> > Regards, >> > Siyuan >> > >> > On Tue, Jul 12, 2016 at 1:09 PM, Amol Kekre <a...@datatorrent.com> >> wrote: >> > >> >> >> >> My vote is to do 2&3 >> >> >> >> Thks >> >> Amol >> >> >> >> >> >> On Tue, Jul 12, 2016 at 12:14 PM, Kottapalli, Venkatesh < >> >> vkottapa...@directv.com> wrote: >> >> >> >>> +1 for deprecating the packages listed below. >> >>> >> >>> -----Original Message----- >> >>> From: hsy...@gmail.com [mailto:hsy...@gmail.com] >> >>> Sent: Tuesday, July 12, 2016 12:01 PM >> >>> >> >>> +1 >> >>> >> >>> On Tue, Jul 12, 2016 at 11:53 AM, David Yan <da...@datatorrent.com> >> >>> wrote: >> >>> >> >>> > Hi all, >> >>> > >> >>> > I would like to renew the discussion of retiring operators in >> Malhar. >> >>> > >> >>> > As stated before, the reason why we would like to retire operators >> in >> >>> > Malhar is because some of them were written a long time ago before >> >>> > Apache incubation, and they do not pertain to real use cases, are >> not >> >>> > up to par in code quality, have no potential for improvement, and >> >>> > probably completely unused by anybody. >> >>> > >> >>> > We do not want contributors to use them as a model of their >> >>> > contribution, or users to use them thinking they are of quality, and >> >>> then hit a wall. >> >>> > Both scenarios are not beneficial to the reputation of Apex. >> >>> > >> >>> > The initial 3 packages that we would like to target are *lib/algo*, >> >>> > *lib/math*, and *lib/streamquery*. >> >>> >> >>> > >> >>> > I'm adding this thread to the users list. Please speak up if you are >> >>> > using any operator in these 3 packages. We would like to hear from >> you. >> >>> > >> >>> > These are the options I can think of for retiring those operators: >> >>> > >> >>> > 1) Completely remove them from the malhar repository. >> >>> > 2) Move them from malhar-library into a separate artifact called >> >>> > malhar-misc >> >>> > 3) Mark them deprecated and add to their javadoc that they are no >> >>> > longer supported >> >>> > >> >>> > Note that 2 and 3 are not mutually exclusive. Any thoughts? >> >>> > >> >>> > David >> >>> > >> >>> > On Tue, Jun 7, 2016 at 2:27 PM, Pramod Immaneni >> >>> > <pra...@datatorrent.com> >> >>> > wrote: >> >>> > >> >>> >> I wanted to close the loop on this discussion. In general everyone >> >>> >> seemed to be favorable to this idea with no serious objections. >> Folks >> >>> >> had good suggestions like documenting capabilities of operators, >> come >> >>> >> up well defined criteria for graduation of operators and what those >> >>> >> criteria may be and what to do with existing operators that may not >> >>> >> yet be mature or unused. >> >>> >> >> >>> >> I am going to summarize the key points that resulted from the >> >>> >> discussion and would like to proceed with them. >> >>> >> >> >>> >> - Operators that do not yet provide the key platform >> capabilities >> >>> to >> >>> >> make an operator useful across different applications such as >> >>> >> reusability, >> >>> >> partitioning static or dynamic, idempotency, exactly once will >> >>> still be >> >>> >> accepted as long as they are functionally correct, have unit >> tests >> >>> >> and will >> >>> >> go into a separate module. >> >>> >> - Contrib module was suggested as a place where new >> contributions >> >>> go in >> >>> >> that don't yet have all the platform capabilities and are not >> yet >> >>> >> mature. >> >>> >> If there are no other suggestions we will go with this one. >> >>> >> - It was suggested the operators documentation list those >> platform >> >>> >> capabilities it currently provides from the list above. I will >> >>> >> document a >> >>> >> structure for this in the contribution guidelines. >> >>> >> - Folks wanted to know what would be the criteria to graduate an >> >>> >> operator to the big leagues :). I will kick-off a separate >> thread >> >>> >> for it as >> >>> >> I think it requires its own discussion and hopefully we can come >> >>> >> up with a >> >>> >> set of guidelines for it. >> >>> >> - David brought up state of some of the existing operators and >> >>> their >> >>> >> retirement and the layout of operators in Malhar in general and >> >>> how it >> >>> >> causes problems with development. I will ask him to lead the >> >>> >> discussion on >> >>> >> that. >> >>> >> >> >>> >> Thanks >> >>> >> >> >>> >> On Fri, May 27, 2016 at 7:47 PM, David Yan <da...@datatorrent.com> >> >>> wrote: >> >>> >> >> >>> >> > The two ideas are not conflicting, but rather complementing. >> >>> >> > >> >>> >> > On the contrary, putting a new process for people trying to >> >>> >> > contribute while NOT addressing the old unused subpar operators >> in >> >>> >> > the repository >> >>> >> is >> >>> >> > what is conflicting. >> >>> >> > >> >>> >> > Keep in mind that when people try to contribute, they always look >> >>> >> > at the existing operators already in the repository as examples >> and >> >>> >> > likely a >> >>> >> model >> >>> >> > for their new operators. >> >>> >> > >> >>> >> > David >> >>> >> > >> >>> >> > >> >>> >> > On Fri, May 27, 2016 at 4:05 PM, Amol Kekre < >> a...@datatorrent.com> >> >>> >> wrote: >> >>> >> > >> >>> >> > > Yes there are two conflicting threads now. The original thread >> >>> >> > > was to >> >>> >> > open >> >>> >> > > up a way for contributors to submit code in a dir (contrib?) as >> >>> >> > > long >> >>> >> as >> >>> >> > > license part of taken care of. >> >>> >> > > >> >>> >> > > On the thread of removing non-used operators -> How do we know >> >>> >> > > what is being used? >> >>> >> > > >> >>> >> > > Thks, >> >>> >> > > Amol >> >>> >> > > >> >>> >> > > >> >>> >> > > On Fri, May 27, 2016 at 3:40 PM, Sandesh Hegde < >> >>> >> sand...@datatorrent.com> >> >>> >> > > wrote: >> >>> >> > > >> >>> >> > > > +1 for removing the not-used operators. >> >>> >> > > > >> >>> >> > > > So we are creating a process for operator writers who don't >> >>> >> > > > want to understand the platform, yet wants to contribute? How >> >>> >> > > > big is that >> >>> >> set? >> >>> >> > > > If we tell the app-user, here is the code which has not >> passed >> >>> >> > > > all >> >>> >> the >> >>> >> > > > checklist, will they be ready to use that in production? >> >>> >> > > > >> >>> >> > > > This thread has 2 conflicting forces, reduce the operators >> and >> >>> >> > > > make >> >>> >> it >> >>> >> > > easy >> >>> >> > > > to add more operators. >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > >> >>> >> > > > On Fri, May 27, 2016 at 3:03 PM Pramod Immaneni < >> >>> >> > pra...@datatorrent.com> >> >>> >> > > > wrote: >> >>> >> > > > >> >>> >> > > > > On Fri, May 27, 2016 at 2:30 PM, Gaurav Gupta < >> >>> >> > > gaurav.gopi...@gmail.com> >> >>> >> > > > > wrote: >> >>> >> > > > > >> >>> >> > > > > > Pramod, >> >>> >> > > > > > >> >>> >> > > > > > By that logic I would say let's put all partitionable >> >>> >> > > > > > operators >> >>> >> > into >> >>> >> > > > one >> >>> >> > > > > > folder, non-partitionable operators in another and so >> on... >> >>> >> > > > > > >> >>> >> > > > > >> >>> >> > > > > Remember the original goal of making it easier for new >> >>> >> > > > > members to contribute and managing those contributions to >> >>> >> > > > > maturity. It is >> >>> >> not a >> >>> >> > > > > functional level separation. >> >>> >> > > > > >> >>> >> > > > > >> >>> >> > > > > > When I look at hadoop code I see these annotations being >> >>> >> > > > > > used at >> >>> >> > > class >> >>> >> > > > > > level and not at package/folder level. >> >>> >> > > > > >> >>> >> > > > > >> >>> >> > > > > I had a typo in my email, I meant to say "think of this >> like >> >>> >> > > > > a >> >>> >> > > folder..." >> >>> >> > > > > as an analogy and not literally. >> >>> >> > > > > >> >>> >> > > > > Thanks >> >>> >> > > > > >> >>> >> > > > > >> >>> >> > > > > > Thanks >> >>> >> > > > > > >> >>> >> > > > > > On Fri, May 27, 2016 at 2:10 PM, Pramod Immaneni < >> >>> >> > > > pra...@datatorrent.com >> >>> >> > > > > > >> >>> >> > > > > > wrote: >> >>> >> > > > > > >> >>> >> > > > > > > On Fri, May 27, 2016 at 1:05 PM, Gaurav Gupta < >> >>> >> > > > > gaurav.gopi...@gmail.com> >> >>> >> > > > > > > wrote: >> >>> >> > > > > > > >> >>> >> > > > > > > > Can same goal not be achieved by using >> >>> >> > > org.apache.hadoop.classification.InterfaceStability.Evolving >> >>> >> > > > / >> >>> >> > > > > > > > >> org.apache.hadoop.classification.InterfaceStability.Uns >> >>> >> > > > > > > > table >> >>> >> > > > > > annotation? >> >>> >> > > > > > > > >> >>> >> > > > > > > >> >>> >> > > > > > > I think it is important to localize the additions in >> one >> >>> >> place so >> >>> >> > > > that >> >>> >> > > > > it >> >>> >> > > > > > > becomes clearer to users about the maturity level of >> >>> >> > > > > > > these, >> >>> >> > easier >> >>> >> > > > for >> >>> >> > > > > > > developers to track them towards the path to maturity >> and >> >>> >> > > > > > > also >> >>> >> > > > > provides a >> >>> >> > > > > > > clearer directive for committers and contributors on >> >>> >> acceptance >> >>> >> > of >> >>> >> > > > new >> >>> >> > > > > > > submissions. Relying on the annotations alone makes >> them >> >>> >> spread >> >>> >> > all >> >>> >> > > > > over >> >>> >> > > > > > > the place and adds an additional layer of difficulty in >> >>> >> > > > identification >> >>> >> > > > > > not >> >>> >> > > > > > > just for users but also for developers who want to find >> >>> >> > > > > > > such >> >>> >> > > > operators >> >>> >> > > > > > and >> >>> >> > > > > > > improve them. This of this like a folder level >> annotation >> >>> >> where >> >>> >> > > > > > everything >> >>> >> > > > > > > under this folder is unstable or evolving. >> >>> >> > > > > > > >> >>> >> > > > > > > Thanks >> >>> >> > > > > > > >> >>> >> > > > > > > >> >>> >> > > > > > > > >> >>> >> > > > > > > > On Fri, May 27, 2016 at 12:35 PM, David Yan < >> >>> >> > > da...@datatorrent.com >> >>> >> > > > > >> >>> >> > > > > > > wrote: >> >>> >> > > > > > > > >> >>> >> > > > > > > > > > >> >>> >> > > > > > > > > > > >> >>> >> > > > > > > > > > > > >> >>> >> > > > > > > > > > > > Malhar in its current state, has way too many >> >>> >> operators >> >>> >> > > > that >> >>> >> > > > > > fall >> >>> >> > > > > > > > in >> >>> >> > > > > > > > > > the >> >>> >> > > > > > > > > > > > "non-production quality" category. We should >> >>> >> > > > > > > > > > > > make it >> >>> >> > > > obvious >> >>> >> > > > > to >> >>> >> > > > > > > > users >> >>> >> > > > > > > > > > > that >> >>> >> > > > > > > > > > > > which operators are up to par, and which >> >>> >> > > > > > > > > > > > operators >> >>> >> are >> >>> >> > > not, >> >>> >> > > > > and >> >>> >> > > > > > > > maybe >> >>> >> > > > > > > > > > > even >> >>> >> > > > > > > > > > > > remove those that are likely not ever used >> in a >> >>> >> > > > > > > > > > > > real >> >>> >> > use >> >>> >> > > > > case. >> >>> >> > > > > > > > > > > > >> >>> >> > > > > > > > > > > >> >>> >> > > > > > > > > > > I am ambivalent about revisiting older >> operators >> >>> >> > > > > > > > > > > and >> >>> >> > doing >> >>> >> > > > this >> >>> >> > > > > > > > > exercise >> >>> >> > > > > > > > > > as >> >>> >> > > > > > > > > > > this can cause unnecessary tensions. My >> original >> >>> >> intent >> >>> >> > is >> >>> >> > > > for >> >>> >> > > > > > > > > > > contributions going forward. >> >>> >> > > > > > > > > > > >> >>> >> > > > > > > > > > > >> >>> >> > > > > > > > > > IMO it is important to address this as well. >> >>> >> > > > > > > > > > Operators >> >>> >> > > outside >> >>> >> > > > > the >> >>> >> > > > > > > play >> >>> >> > > > > > > > > > area should be of well known quality. >> >>> >> > > > > > > > > > >> >>> >> > > > > > > > > > >> >>> >> > > > > > > > > I think this is important, and I don't anticipate >> >>> >> > > > > > > > > much >> >>> >> > tension >> >>> >> > > if >> >>> >> > > > > we >> >>> >> > > > > > > > > establish clear criteria. >> >>> >> > > > > > > > > It's not helpful if we let the old subpar operators >> >>> >> > > > > > > > > stay >> >>> >> and >> >>> >> > > put >> >>> >> > > > up >> >>> >> > > > > > the >> >>> >> > > > > > > > > bars for new operators. >> >>> >> > > > > > > > > >> >>> >> > > > > > > > > David >> >>> >> > > > > > > > > >> >>> >> > > > > > > > >> >>> >> > > > > > > >> >>> >> > > > > > >> >>> >> > > > > >> >>> >> > > > >> >>> >> > > >> >>> >> > >> >>> >> >> >>> > >> >>> > >> >>> >> >> >> >> >> > >> > >