David Mollitor created AVRO-4069:
------------------------------------

             Summary: Remove Reader String Cache from Generic Datum Reader
                 Key: AVRO-4069
                 URL: https://issues.apache.org/jira/browse/AVRO-4069
             Project: Apache Avro
          Issue Type: Improvement
          Components: java
    Affects Versions: 1.12.0
            Reporter: David Mollitor
            Assignee: David Mollitor
             Fix For: 1.13.0


I was doing some profiling, and this "ReaderCache" code lit up:

{code:java}
    public Class getStringClass(final Schema s) {
      final IdentitySchemaKey key = new IdentitySchemaKey(s);
      return this.stringClassCache.computeIfAbsent(key, (IdentitySchemaKey k) 
-> this.findStringClass.apply(k.schema));
    }
  }

  private final ReaderCache readerCache = new 
ReaderCache(this::findStringClass);

  protected Class findStringClass(Schema schema) {
    String name = schema.getProp(GenericData.STRING_PROP);
    if (name == null)
      return CharSequence.class;

    switch (GenericData.StringType.valueOf(name)) {
    case String:
      return String.class;
    default:
      return CharSequence.class;
    }
  }
 {code}

The String cache here is simply caching a single value: the class of the 
{{STRING_PROP}} in the Schema.  Well, this is a lot over overhead for caching a 
relatively simple mapping. Consider that this must create a new 
{{IdentitySchemaKey}} object every time it does this lookup and this is a HOT 
path. It would take less time time, and add less Heap pressure, to perform the 
simple mapping for each invocation.

Follow on work: the Map in the Schema is synchronized. Maybe the map can be 
made non-synchronized or the Schema can explicitly cache this value in a 
non-synchronized way to make this one property load faster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to