Mark Payne created NIFI-9382:
--------------------------------

             Summary: Improve startup time when loading flow that uses many 
HDFS related processors
                 Key: NIFI-9382
                 URL: https://issues.apache.org/jira/browse/NIFI-9382
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework, Extensions
            Reporter: Mark Payne
            Assignee: Mark Payne


When starting NiFI, if a flow has many HDFS related processors (hundreds to 
thousands) the startup time can be very long. In one case, I have a user flow 
that has > 1000 HDFS processors and it takes 1-2 hours to fully start NiFi.

This is because the HDFS makes a lot of assumptions about the environment that 
it's running in. These assumptions are not always true, unfortunately, when 
running in NiFi. The use of static methods in the UserGroupInformation class 
means that in order to interact with an HDFS cluster using multiple Kerberos 
Principals, we have to create ClassLoader isolation, using a separate, 
duplicate ClassLoader for each HDFS processor.

Because of this, the HDFS client components must be initialized once for each 
processor, and the initialization of the client is very expensive. We need to 
improve this so that we don't create a separate ClassLoader that loads hundreds 
or thousands of classes for each instance of the Processor.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to