Arek Burdach created NIFI-5757:
----------------------------------

             Summary: AvroRecordSetWriter synchronize every access to 
compiledAvroSchemaCache
                 Key: NIFI-5757
                 URL: https://issues.apache.org/jira/browse/NIFI-5757
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework
    Affects Versions: 1.7.1
            Reporter: Arek Burdach


Avro record serialization is a quite expensive operation.

This stack trace I very often see in thread dumps:

{noformat}
Thread 48583: (state = BLOCKED)
 - org.apache.nifi.avro.AvroRecordSetWriter.compileAvroSchema(java.lang.String) 
@bci=9, line=124 (Compiled frame)
 - 
org.apache.nifi.avro.AvroRecordSetWriter.createWriter(org.apache.nifi.logging.ComponentLog,
 org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) 
@bci=96, line=92 (Compiled frame)
 - sun.reflect.GeneratedMethodAccessor183.invoke(java.lang.Object, 
java.lang.Object[]) @bci=56 (Compiled frame)
 - sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object, 
java.lang.Object[]) @bci=6, line=43 (Compiled frame)
 - java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[]) 
@bci=56, line=498 (Compiled frame)
 - 
org.apache.nifi.controller.service.StandardControllerServiceInvocationHandler.invoke(java.lang.Object,
 java.lang.reflect.Method, java.lang.Object[]) @bci=309, line=89 (Compiled 
frame)
 - com.sun.proxy.$Proxy100.createWriter(org.apache.nifi.logging.ComponentLog, 
org.apache.nifi.serialization.record.RecordSchema, java.io.OutputStream) 
@bci=24 (Compiled frame)
 - 
org.apache.nifi.processors.kafka.pubsub.PublisherLease.publish(org.apache.nifi.flowfile.FlowFile,
 org.apache.nifi.serialization.record.RecordSet, 
org.apache.nifi.serialization.RecordSetWriterFactory, 
org.apache.nifi.serialization.record.RecordSchema, java.lang.String, 
java.lang.String) @bci=71, line=169 (Compiled frame)
 - 
org.apache.nifi.processors.kafka.pubsub.PublishKafkaRecord_1_0$1.process(java.io.InputStream)
 @bci=94, line=412 (Compiled frame)
{noformat}

The reason why it happens is because {{AvroRecordSetWriter}} synchronizing 
every access to cache of compiled schemas.

I've prepared PR that is fixing this issue by using {{ConcurrentHashMap}} 
instead: https://github.com/apache/nifi/pull/3111

It is not a perfect fix because it removes cache size limitation which BTW was 
hardcoded to {{20}}. Services can be reusable by many flows so such a hard 
limit is not a good choice.

What do you think about such an improvement?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to