Beam Java users, I've run into a few cases where I want to present a single thread-safe data structure to all threads on a worker, and I end up writing a good bit of custom code each time involving a synchronized method that handles creating the resource exactly once, and then each thread has its own reference to the singleton. I don't have extensive experience with thread safety in Java, so it seems likely I'm going to get this wrong.
Are there any best practices for state that is shared across threads? Any prior art I can read up on? The most concrete case I have in mind is loading a GeoIP database for doing city lookups from IP addresses. We're using MaxMind's API which allows mapping a portion of memory to a file sitting on disk. We have a synchronized method that checks if the reader has been initialized [0] ; if not, we copy the database file from GCS to local disk, build the DatabaseReader instance, and return it. Other threads will see the already-initialized and just get a reference to it instead. This all appears to work, and it saves memory compared to each thread maintaining their own DatabaseReader. But is there a safer or more built-in way to do this? Am I missing relevant hooks in the Beam API that would make this cleaner? [0] https://github.com/mozilla/gcp-ingestion/blob/master/ingestion-beam/src/main/java/com/mozilla/telemetry/decoder/GeoCityLookup.java#L95
