Hi, On 25. Oct 2019, at 17:53, Marshall Schor <m...@schor.com> wrote: > > One other useful sources for examples: The test cases for UIMA, e.g. search > the > uimaj-core projects *.java files for "getSofaDataStream".
Ok, let me elaborate :) One can use setSofaDataURI(url) to tell the CAS that the sofa data is actually external. One can then use getSofaDataStream() resolve the URL and retrieve the data as a stream. So let's assume I have a CAS containing annotations on a text and the text is in an external file: CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, null, null); cas.setSofaDataURI("file:/path/to/my/file", "text/plain"); Works nice when I use getSofaDataStream() to retrieve the data. But I can't use the "normal" methods like getDocumentText() or getCoveredText() at all. Also, I cannot call setSofaDataString(urlContent, "text/plain") - it throws an exception because there is already a sofaURI set. This is a major inconvenience. The ClearTK guys came up with an approach that tries to make this a bit more convenient: * they introduce a well-known view named "UriView" and set the sofaDataURI in that view. * then they use a special reader which looks up the URI in that view, resolves it and drops the content into the sofaDataString of the "_defaultView". That way they get the benefit of the externally stored sofa as well as the ability to use the usual methods to access the text. When I looked at setSofaDataURI(), I naively expected that it would be resolved the first time I try to access the sofa data (e.g. via getDocumentText()) - but that doesn't happen. Then I expected that I would just call getSofaDataStream() and manually drop the contents into setSofaDataString() and that this data string would be "transient", i.e. not saved into XMI because we already have a setSofaDataURI set... but that expectation was also not fulfilled. Could it be useful to introduce some place where we can transiently drop data obtained from the sofaDataURI such that methods like getDocumentText() and getCoveredText() do something useful but also such that the data is not included when serializing the CAS to whatever format? Cheers, -- Richard