David Mollitor created AVRO-4069: ------------------------------------ Summary: Remove Reader String Cache from Generic Datum Reader Key: AVRO-4069 URL: https://issues.apache.org/jira/browse/AVRO-4069 Project: Apache Avro Issue Type: Improvement Components: java Affects Versions: 1.12.0 Reporter: David Mollitor Assignee: David Mollitor Fix For: 1.13.0
I was doing some profiling, and this "ReaderCache" code lit up: {code:java} public Class getStringClass(final Schema s) { final IdentitySchemaKey key = new IdentitySchemaKey(s); return this.stringClassCache.computeIfAbsent(key, (IdentitySchemaKey k) -> this.findStringClass.apply(k.schema)); } } private final ReaderCache readerCache = new ReaderCache(this::findStringClass); protected Class findStringClass(Schema schema) { String name = schema.getProp(GenericData.STRING_PROP); if (name == null) return CharSequence.class; switch (GenericData.StringType.valueOf(name)) { case String: return String.class; default: return CharSequence.class; } } {code} The String cache here is simply caching a single value: the class of the {{STRING_PROP}} in the Schema. Well, this is a lot over overhead for caching a relatively simple mapping. Consider that this must create a new {{IdentitySchemaKey}} object every time it does this lookup and this is a HOT path. It would take less time time, and add less Heap pressure, to perform the simple mapping for each invocation. Follow on work: the Map in the Schema is synchronized. Maybe the map can be made non-synchronized or the Schema can explicitly cache this value in a non-synchronized way to make this one property load faster. -- This message was sent by Atlassian Jira (v8.20.10#820010)