Thanks for this – you make some very interesting points about the use of Logstash and you are correct, I am only just looking at Logstash but will now look to use Nifi if possible instead to connect to my central cluster. Regards Conrad
From: Andrew Psaltis <psaltis.and...@gmail.com<mailto:psaltis.and...@gmail.com>> Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Date: Saturday, 7 May 2016 at 16:43 To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi Hi Conrad, Based on your email it sounds like you are potentially just getting started with Logstash. The one thing I can share is that up until recently I worked in an environment where we had ~3,000 nodes deployed and all either had Logstash or Flume (was transitioning to Logstash). We used Puppet and the Logstash module was in the base templates so as App developers provisioned new nodes Logstash was automatically deployed and configured. I can tell you that it seems really easy at first, however, my team was always messing with, tweaking, and troubleshooting the Logstash scripts as we wanted to ingest different data sources, modify how the data was captured, or fix bugs. Knowing now what I do about NiFi, if I had a chance to do it over again (will be talking to old colleagues about it) I would just use Nifi on all of those edge nodes and then send the data to central NiFi cluster. To me there are at least several huge benefits to this: 1. You use one tool, which provides an amazingly easy and very powerful way to control and adjust the dataflow all without having to muck with any scripts. You can easily filter / enrich / transform the data at the edge node all via a UI. 2. You get provenance information from the edge all the way back. This is very powerful, you can actually answer the questions from others of "how come my log entry never made it to System X" or even better how the data was changed along the way. The "why did my log entry make it to System X" sometimes can be answered via searching through logs, but that also assumes you have the information in the logs to begin with. I can tell you that these questions will come up. We had data that would go through a pipeline and finally into HDFS. And we would get the questions from app developers when they queried the data in Hive and wanted to know why certain log entries were missing. Hope this helps. In good health, Andrew On Sat, May 7, 2016 at 8:15 AM, Conrad Crampton <conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote: Hi Bryan, Some good tips and validation of my thinking. It did occur to me to use the standalone NiFi and as I have no particular need to use Logstash for any other reason. Thanks Conrad From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>> Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Date: Friday, 6 May 2016 at 14:56 To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" <users@nifi.apache.org<mailto:users@nifi.apache.org>> Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi Hi Conrad, I am not that familiar with LogStash, but as you mentioned there is a PR for Lumberjack processors [1] which is not yet released, but could help if you are already using LogStash. If LogStash has outputs for TCP, UDP, or syslog then like you mentioned, it seems like this could work well with ListenTCP, ListenUDP, or ListenSyslog. I think the only additional benefit of Lumberjack is that it is an application level protocol that provides additional reliability on top of the networking protocols, meaning if ListenLumberjack receives an event over TCP it would then acknowledge that NiFi has successfully received and stored the data, since TCP can only guarantee it was delivered to the socket, but the application could have dropped it. Although MiNiFi is not yet released, a possible solution is to run standalone NiFi instances on the servers where your logs are, with a simple flow like TailFile -> Remote Process Group which sends the logs back to a central NiFi instance over Site-To-Site. Are you able to share any more info about what kind of logs they are and how they are being produced? If they are coming from Java applications using logback or log4j, and if you have control over those applications, you can also use a specific appender like a UDP appender to send directly over to ListenUDP in NiFi. Hope that helps. -Bryan [1] https://github.com/apache/nifi/pull/290 On Fri, May 6, 2016 at 3:33 AM, Conrad Crampton <conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote: Hi, Some advice if possible please. Whilst I would love to wait for the MiNiFi project realise its objectives as this sounds exactly what I want from the initial suggestions I have a pressing need to shift some log files on remote servers (to my DC) to my NiFi cluster. Having a quick look at LogStash it would look to provide what I want but there doesn’t (yet – I’m aware of the work going on Lumberjack processor but not in current release) appear to be a simple way of getting files from Logstash to Nifi. The options currently would appear to be use any number of output plugins in Logstash – TCP, UDP, syslog, kafka, http, rabbitmq then use the equivalent receiver in Nifi (with some intermediate service in some cases – Kafka, rabbitmq). Can any one suggest the ‘best’ way here? I’m trying to prove a point about cutting out some other intermediate product so this is something that has to be in production now – I can always refactor at a later date to have a ‘better’ solution (MiNiFi ??). Why don’t I ask on Logstash forums? You folks have always been a great help before ;-) Thanks Conrad Nb. Of course not saying Logstash folks wouldn’t be equally helpful :-) SecureData, combating cyber threats ________________________________ The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote. SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT ***This email originated outside SecureData*** Click here<https://www.mailcontrol.com/sr/CdEQiWndhxLGX2PQPOmvUsrLibhXE7+SpVooqDfjfmrv9UcAoCvw58JRjsQQpswieUDNxz32L0IKghm6!2a+jw==> to report this email as spam. -- Thanks, Andrew Subscribe to my book: Streaming Data<http://manning.com/psaltis> [https://static.licdn.com/scds/common/u/img/webpromo/btn_viewmy_160x25.png]<https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> twiiter: @itmdata<http://twitter.com/intent/user?screen_name=itmdata>