Re: Load balancing just stopped working in NIFI 1.16.1

2022-05-19 Thread Jens M. Kofoed
Hi Mark Thanks for the reply. The “funny” part is that it worked before and I haven’t made any change to the configuration. And I have deleted the state folder at all nodes and it doesn’t work. It is the same configuration as 3 other clusters except hostname/up adresses and certificates off

Re: NiFi to draw samples from very large raw data sets

2022-05-19 Thread Matt Burgess
If you have large FlowFiles and are trying to sample records from each, you can use SampleRecord. It has Interval Sampling, Probabilistic Sampling, and Reservoir Sampling strategies, and I have a PR [1] up to add Range Sampling [2]. Regards, Matt [1] https://github.com/apache/nifi/pull/5878 [2]

Re: Load balancing just stopped working in NIFI 1.16.1

2022-05-19 Thread Mark Payne
Jens, So that would tell us that the hostname or the port is wrong, or that NiFi is not running/listening for load balanced connections. I would recommend the following: - Check nifi.properties on all nodes to make sure that the nifi.cluster.load.balance.host property is set to the node’s

Re: Null Record in ConsumeKafkaRecord

2022-05-19 Thread Prasanth M Sasidharan
Thanks Mark. That worked. Much appreciated. On Thu, May 19, 2022 at 6:45 PM Mark Payne wrote: > Hi Prasanth, > > Take a look at the Record Writer that you’re using with > ConsumeKafkaRecord. There’s a property name “Suppress Null Values.” You’ll > want to set that to “Suppress Missing Values.”

Re: Null Record in ConsumeKafkaRecord

2022-05-19 Thread Mark Payne
Hi Prasanth, Take a look at the Record Writer that you’re using with ConsumeKafkaRecord. There’s a property name “Suppress Null Values.” You’ll want to set that to “Suppress Missing Values.” That should give you what you’re looking for. Thanks -Mark On May 19, 2022, at 7:53 AM, Prasanth M

Null Record in ConsumeKafkaRecord

2022-05-19 Thread Prasanth M Sasidharan
Hello Team, I am using ConsumeKafkaRecord_2_0 1.15.3 processor in Nifi to consume JSON data from Kafka Topic. My issue is that the consumeKafka output matches the schema of both records and adds the missing tags in the JSON with null value . Eg: [ { "acknowledged": "0", "internal_last":

Load balancing just stopped working in NIFI 1.16.1

2022-05-19 Thread Jens M. Kofoed
Hi I have a 3 node test cluster running v. 1.16.1. Which has been working fine, with no errors. But i doesn't do much, since it is my test cluster. But now I am struggling with load balance connection refuse between nodes. Both node 2 and 3 are refusing load balancing connections, even after

Re: NiFi to draw samples from very large raw data sets

2022-05-19 Thread Joe Gresock
Also, I just realized I misread your sampling requirement. You would use the approach above if you wanted to sample *every 1032th flowfile*, but you want a sample size of 1032 total. You can still use a randomizing selection approach as I described (though your mod value would depend on what

Re: NiFi to draw samples from very large raw data sets

2022-05-19 Thread Joe Gresock
James, This sounds like an interesting project. I would recommend RouteOnAttribute with a "sample" property with value "${random():mod(1032):equals(100)}" (the second number could be anything between 0 and 1031), and then routing the "sample" relationship to your sampling path. I'm not sure I

NiFi to draw samples from very large raw data sets

2022-05-19 Thread James McMahon
I have been tasked to draw samples from very large raw data sets for triage analysis. I am to provide multiple sampling methods. Drawing a random sample of N records is one method. A second method is to draw a fixed sample of 1,032 records from stratified defined date boundaries in a set. The

Final reminder: ApacheCon North America call for presentations closing soon

2022-05-19 Thread Rich Bowen
[Note: You're receiving this because you are subscribed to one or more Apache Software Foundation project mailing lists.] This is your final reminder that the Call for Presetations for ApacheCon North America 2022 will close at 00:01 GMT on Monday, May 23rd, 2022. Please don't wait! Get your talk