Hi,
Any advice on ‘best’ architectural approach whereby some processing function 
has to be applied to every flow file in a dataflow with some (possible) output 
based on flowfile content.
e.g. inspect log files for specific ip then send message to syslog

approach 1
Spark
Output port from NiFi -> Spark listens to that stream -> processes and outputs 
accordingly
Advantages – scale spark job on Yarn, decoupled (reusable) from NiFi
Disadvantages – adds complexity, decoupled from NiFi.

Approach 2
NiFi
Custom processor -> PutSyslog
Advantages – reuse existing NiFi processors/ capability, obvious flow (design 
intent)
Disadvantages – scale??

Any comments/ advice/ experience of either approaches?

Thanks
Conrad




SecureData, combating cyber threats
______________________________________________________________________ 
The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT

Reply via email to