Re: NiFi-light for analysts

Joe Witt Mon, 29 Jun 2020 10:17:29 -0700

That would be a fine option for those users who are capable to run maven
builds. I think evolving the nifi registry and nifi integration to source
all nars as needed at runtime from the registry would be the best user
experience and deployment answer over time.


Thanks

On Mon, Jun 29, 2020 at 9:57 AM Mike Thomsen <mikerthom...@gmail.com> wrote:

> As far as I can tell, Kylo is dead based on their public github activity.
>
> Mark,
>
> Would it make sense for us to start modularizing nifi-assembly with more
> profiles? That way people like Boris could run something like this:
>
> mvn install -Pinclude-grpc,include-graph,!include-kafka,!include-mongodb
>
> On Mon, Jun 29, 2020 at 11:20 AM Boris Tyukin <bo...@boristyukin.com>
> wrote:
>
>> Hi Mark, thanks for the great comments and for working on these
>> improvements. these are great enhancements that we
>> can certainly benefit from - I am thinking of two projects at least we
>> support today.
>>
>> As far as making it more user-friendly, at some point I looked at Kylo.io
>> and it was quite an interesting project - not sure if it is alive still -
>> but I liked how they created their own UI/tooling around NiFi.
>>
>> I am going to toy with this idea to have a "dumb down" version of NiFi.
>>
>> On Sun, Jun 28, 2020 at 3:36 PM Mark Payne <marka...@hotmail.com> wrote:
>>
>>> Hey Boris,
>>>
>>> There’s a good bit to unpack here but I’ll try to answer each question.
>>>
>>> 1) I would say that the target audience for NiFi really is a person with
>>> a pretty technical role. Not developers, necessarily, though. We do see a
>>> lot of developers using it, as well as data scientists, data engineers, sys
>>> admins, etc. So while there may be quite a few tasks that a non-technical
>>> person can achieve, it may be hard to expose the platform to someone
>>> without a technical background.
>>>
>>> That said, I do believe that you’re right about the notion of flow
>>> dependencies. I’ve done some work recently to help improve this. For
>>> example, NIFI-7476 [1] makes it possible to configure a Process Group in
>>> such a way that only a single FlowFile at a time is allowed into the group.
>>> And the data is optionally held within the group until that FlowFile has
>>> completed processing, even if it’s split up into many parts. Additionally,
>>> NIFI-7509 [2] updates the List* processors so that they can use an optional
>>> Record Writer. This makes it possible to get a full listing of a directory
>>> from ListFile as a single FlowFile. Or a listing of all items in an S3
>>> bucket or an Azure Blob Store, etc. So when that is combined with
>>> NIFI-7476, it makes it very easy to process an entire directory of files or
>>> an entire bucket, etc. and wait until all processing is complete before
>>> data is transferred on to the next task. (Additionally, NIFI-7552 updates
>>> this to add attributes indicating FlowFile counts for each Output Port so
>>> it’s easy to determine if there were any “processing failures” etc.).
>>>
>>> So with all of the above said, I don’t think that it necessarily solves
>>> in a simple and generic sense the requirement to complete Task A, then Task
>>> B, and then Task C. But it does put us far closer. This may be achievable
>>> still with some nesting of Process Groups, etc. but it won’t be completely
>>> as straight-forward as I’d like and would perhaps add significantly latency
>>> if it’s allowing only a single FlowFile at a time though the Process Group.
>>> Perhaps that can be addressed in the future by having the ability to bulk
>>> transfer all FlowFiles from Queue A to Queue B, and then allowing a "Batch
>>> Input" on a Process Group instead of just “Streaming" vs. "Single FlowFile
>>> at a Time.” I do think there will be some future improvements along these
>>> lines, though.
>>>
>>> 2) This should be fairly straight-forward. It would basically be just
>>> creating an assembly like the nifi-assembly module but one that doesn’t
>>> include all of the nar’s.
>>>
>>> 3) This probably boils down to some trade-offs and what makes most sense
>>> for your organization. A single, large NiFi deployment makes it much easier
>>> for the sys admins, generally. The NiFi policies should provide the needed
>>> multi-tenancy in terms of authorization. But it doesn’t really offer much
>>> in terms of resource isolation. So, if resource isolation is important to
>>> you, then using separate NiFi deployments is likely desirable.
>>>
>>> Hope this helps!
>>> -Mark
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-7476
>>> [2] https://issues.apache.org/jira/browse/NIFI-7509
>>> [3] https://issues.apache.org/jira/browse/NIFI-7552
>>>
>>>
>>>
>>> On Jun 28, 2020, at 1:04 PM, Boris Tyukin <bo...@boristyukin.com> wrote:
>>>
>>> Hi guys,
>>>
>>> I am thinking to increase the footprint of NiFi in my org to extend it
>>> to less technical roles. I have a few questions:
>>>
>>> 1) is there any plans to support easy dependencies at some point? We are
>>> aware of all the current options (wait-notify, kafka,
>>> mergerecord/mergecontent etc.) and all of them are still hard and not
>>> reliable. For non-technical roles, we really need very stupid simple way to
>>> define classical dependencies like run task C only after task A and B are
>>> finished. I realize it is a challenge because of the whole concept of NiFi
>>> with flowfiles (which we do love being on a technical side of the house),
>>> but I really do not want to get another ETL/scheduling tool.
>>>
>>> 2) is it fairly easy to build and support our custom version of
>>> NiFi-light, when we remove all the processors that we do not want to expose
>>> to non-technical people? The idea is to remove all the processors that
>>> consume cpu/ram to force them benefit from our Big Data systems and not use
>>> NiFi to do the actual processing. We would like to leave these capabilities
>>> to our data engineering team while shift our analysts to ELT/ELTL paradigm
>>> to let them run SQL and benefit from Big Data engines.
>>>
>>> 3) what would be recommended set up for multiple decentralized teams?
>>> separate NiFi instances when they can support their own jobs while our
>>> admin supports all these instances? or one large NiFi cluster when everyone
>>> works on the same NiFi cluster? We do not want them to step on each other
>>> jobs, see each other failure alerts/bulletins etc. We want to make it look
>>> like their team's own environment. Not sure if NiFi policies are mature
>>> enough to provide this sort of isolation.
>>>
>>> Thanks,
>>> Boris
>>>
>>>
>>>

Re: NiFi-light for analysts

Reply via email to