Thanks for this – you make some very interesting points about the use of 
Logstash and you are correct, I am only just looking at Logstash but will now 
look to use Nifi if possible instead to connect to my central cluster.
Regards
Conrad

From: Andrew Psaltis <psaltis.and...@gmail.com<mailto:psaltis.and...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Saturday, 7 May 2016 at 16:43
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi

Hi Conrad,
Based on your email it sounds like you are potentially just getting started 
with Logstash. The one thing I can share is that up until recently I worked in 
an environment where we had ~3,000 nodes deployed and all either had Logstash 
or Flume (was transitioning to Logstash). We used Puppet and the Logstash 
module was in the base templates so as App developers provisioned new nodes 
Logstash was automatically deployed and configured. I can tell you that it 
seems really easy at first, however, my team was always messing with, tweaking, 
and troubleshooting the Logstash scripts as we wanted to ingest different data 
sources, modify how the data was captured, or fix bugs. Knowing now what I do 
about NiFi, if I had a chance to do it over again (will be talking to old 
colleagues about it) I would just use Nifi on all of those edge nodes and then 
send the data to central NiFi cluster. To me there are at least several huge 
benefits to this:

  1.  You use one tool, which provides an amazingly easy and very powerful way 
to control and adjust the dataflow all without having to muck with any scripts. 
You can easily filter / enrich / transform the data at the edge node all via a 
UI.
  2.  You get provenance information from the edge all the way back. This is 
very powerful, you can actually answer the questions from others of "how come 
my log entry never made it to System X" or even better how the data was changed 
along the way. The "why did my log entry make it to System X" sometimes can be 
answered via searching through logs, but that also assumes you have the 
information in the logs to begin with. I can tell you that these questions will 
come up. We had data that would go through a pipeline and finally into HDFS. 
And we would get the questions from app developers when they queried the data 
in Hive and wanted to know why certain log entries were missing.

Hope this helps.

In good health,
Andrew

On Sat, May 7, 2016 at 8:15 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi Bryan,
Some good tips and validation of my thinking.
It did occur to me to use the standalone NiFi and as I have no particular need 
to use Logstash for any other reason.
Thanks
Conrad

From: Bryan Bende <bbe...@gmail.com<mailto:bbe...@gmail.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Date: Friday, 6 May 2016 at 14:56
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
<users@nifi.apache.org<mailto:users@nifi.apache.org>>
Subject: Re: Logstash/ Filebeat/ Lumberjack -> Nifi

Hi Conrad,

I am not that familiar with LogStash, but as you mentioned there is a PR for 
Lumberjack processors [1] which is not yet released, but could help if you are 
already using LogStash.
If LogStash has outputs for TCP, UDP, or syslog then like you mentioned, it 
seems like this could work well with ListenTCP, ListenUDP, or ListenSyslog.

I think the only additional benefit of Lumberjack is that it is an application 
level protocol that provides additional reliability on top of the networking 
protocols, meaning if ListenLumberjack receives an event over TCP it would then 
acknowledge that NiFi has successfully received and stored the data, since TCP 
can only guarantee it was delivered to the socket, but the application could 
have dropped it.

Although MiNiFi is not yet released, a possible solution is to run standalone 
NiFi instances on the servers where your logs are, with a simple flow like 
TailFile -> Remote Process Group which sends the logs back to a central NiFi 
instance over Site-To-Site.

Are you able to share any more info about what kind of logs they are and how 
they are being produced?
If they are coming from Java applications using logback or log4j, and if you 
have control over those applications, you can also use a specific appender like 
a UDP appender to send directly over to ListenUDP in NiFi.

Hope that helps.

-Bryan

[1] https://github.com/apache/nifi/pull/290

On Fri, May 6, 2016 at 3:33 AM, Conrad Crampton 
<conrad.cramp...@secdata.com<mailto:conrad.cramp...@secdata.com>> wrote:
Hi,
Some advice if possible please. Whilst I would love to wait for the MiNiFi 
project realise its objectives as this sounds exactly what I want from the 
initial suggestions I have a pressing need to shift some log files on remote 
servers (to my DC) to my NiFi cluster. Having a quick look at LogStash it would 
look to provide what I want but there doesn’t (yet – I’m aware of the work 
going on Lumberjack processor but not in current release) appear to be a simple 
way of getting files from Logstash to Nifi.

The options currently would appear to be use any number of output plugins in 
Logstash – TCP, UDP, syslog, kafka, http, rabbitmq then use the equivalent 
receiver in Nifi (with some intermediate service in some cases – Kafka, 
rabbitmq).

Can any one suggest the ‘best’ way here? I’m trying to prove a point about 
cutting out some other intermediate product so this is something that has to be 
in production now – I can always refactor at a later date to have a ‘better’ 
solution (MiNiFi ??).

Why don’t I ask on Logstash forums? You folks have always been a great help 
before ;-)

Thanks
Conrad

Nb. Of course not saying Logstash folks wouldn’t be equally helpful :-)


SecureData, combating cyber threats

________________________________

The information contained in this message or any of its attachments may be 
privileged and confidential and intended for the exclusive use of the intended 
recipient. If you are not the intended recipient any disclosure, reproduction, 
distribution or other dissemination or use of this communications is strictly 
prohibited. The views expressed in this email are those of the individual and 
not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if 
followed up by a formal written quote.

SecureData Europe Limited. Registered in England & Wales 04365896. Registered 
Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, 
ME16 9NT




***This email originated outside SecureData***

Click 
here<https://www.mailcontrol.com/sr/CdEQiWndhxLGX2PQPOmvUsrLibhXE7+SpVooqDfjfmrv9UcAoCvw58JRjsQQpswieUDNxz32L0IKghm6!2a+jw==>
 to report this email as spam.



--
Thanks,
Andrew

Subscribe to my book: Streaming Data<http://manning.com/psaltis>
[https://static.licdn.com/scds/common/u/img/webpromo/btn_viewmy_160x25.png]<https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
twiiter: @itmdata<http://twitter.com/intent/user?screen_name=itmdata>

Reply via email to