Best Regards,

Mario Juric
Principal Engineer
UNSILO.ai
mobile:  +45 3082 4100



Hi Marshall,

I have a small test case  with 3 files excluding any JCasGen generated types and UIMAfit types file.

First you will have to generate the types and run the SaveCompressedBinary to produce the 3 binaries forms I have been experimenting with. Yo should then be able to run LoadCompressedBinaries as expected.

Next you need to change the element type of Container.features from FeatureAnnotation to FeatureRecord in the type system and generate the type system again. Also change the FeatureAnnotation reference In LoadCompressedBinaries l. 25 to FeatureRecord and then try to reload the previously stored binaries again without saving them first using the new type system.

You can see I have played with different ways of loading just to see if anything worked, but much of it seems to result in exactly the same calls in the lower layers. I didn’t get entirely the same results with the CAS we actually store as in this example. E.g. I experienced some EOF with the compressed filtered whereas I only get a class cast exception during verification in this example. Note also that we keep both types in the new type system, but we want to change the element type of the FSArray in the Container.

Hope this will yield some useful insights and thanks a lot :)

Cheers
Mario




Attachment: LoadCompressedBinary.java
Description: Binary data

Attachment: SaveCompressedBinary.java
Description: Binary data

Attachment: SimpleTypeSystem_TS.xml
Description: XML document





On 13 Sep 2019, at 21:55 , Mario Juric <[email protected]> wrote:

Thanks Marshall,

I’ll get back to you with a small sample as soon I get the time to do it. This will also get me a better understanding of the the format.


Cheers,
Mario












On 13 Sep 2019, at 19:32 , Marshall Schor <[email protected]> wrote:

I'm wondering if you could post a very small test case showing this problem with
a small type system. 

With that, I could run in the debugger and see exactly what was happening, and
see whether or not some small fix would make this work.

The Deserializer for this already supports a certain type of mismatch between
type systems, but mainly one where one is a subset of the other - see the
javadoc for the method

org.apache.uima.cas.impl.BinaryCasSerDes6.java.

But it must not currently cover this particular case.

-Marshall

On 9/13/2019 10:48 AM, Mario Juric wrote:
Just a quick follow up.

I played a bit around with the CasIOUtils, and it seems that it is possible to load and use the embedded type system, i.e. the old type system with X, but I found no way to replace it with the new type system and make the necessary mappings to Y. I tried to see if I could use the CasCopier in a separate step but it expectedly fails when it reaches to the FSArray of X in the source CAS because the destination type system requires elements of type Y. I could make my own modified version of the CasCopier that could take some mapping functions for each pair of source and destination types that need to be mapped, but this is where it starts to get too complicated, so I found it not to be worth it at this point, since we might then want to reprocess everything from scratch anyway.

Cheers,
Mario













On 12 Sep 2019, at 10:41 , Mario Juric <[email protected]> wrote:

Hi,

We use form 6 compressed binaries to persist the CAS. We now want to make a change to the type system that is not directly compatible, although in principle the new type system is really a subset from a data perspective, so we want to migrate existing binaries to the new type system, but we don’t know how. The change is as follows:

In the existing type system we have a type A with a FSArray feature of element type X, and we want to change X to Y where Y contains a genuine feature subset of X. This means we basically want to replace X with Y for the FSArray and ditch a few attributes of X when loading the CAS into the new type system.

Had the CAS been stored in JSON this would be trivial by just mapping the attributes that they have in common, but when I try to load the CAS binary into the new target type system it chokes with an EOF, so I don’t know if that is at all possible with a form 6 compressed CAS binary?

I pocked a bit around in the reference, API and mailing list archive but I was not able to find anything useful. I can of course keep parallel attributes for both X and Y and then have a separate step that makes an explicit conversion/copy, but I prefer to avoid this. I would appreciate any input to the problem, thanks :)

Cheers,
Mario
















Reply via email to