I can reproduce the problem, and see what is happening.  The deserialization
code compares the two type systems, and allows for some mismatches (things
present in one and not in the other), but it doesn't allow for having a feature
whose range (value) is type XXXX in one type system and type YYYY in the other. 
See CasTypeSystemMapper lines 299 - 315.

It may not be easy to fix.  Basically, the deserialization routines are set up
with a lenient kind of accommodation for different type systems, where they can
"skip" over types and features that are missing. 

This particular transformation needs to run a value conversion - from
FeatureAnnotation to FeatureRecord. 

I'm thinking of various approaches, and putting these out for others to expand
upon, etc.

1) Along the lines of Richard's remark, fix the xmi serialization to work with
all binary data, perhaps by base-64 encoding problematic (or specified by
feature name, or all) values, or - if it turns out to just be some "bug" -
fixing the bug.

2) Allow the user to specify some kind of call-back function, in the
deserializer, when the range of the feature doesn't match.  This would take some
kind of representation of the feature value in typesystem1, and the type of the
feature value in type system 2, and would need to produce the value in type
system 2.  This may be quite problematic/awkward to carry out in all the
generalized edge cases, for instance if there are "forward" references to things
not yet deserialized, etc.

At this point, I think #1 could be quite feasible.  To investigate further, it
would help to have a small test case where the xmi serialization currently is
not readable (due to - as you think - character coding issues).

-Marshall

On 9/16/2019 8:11 AM, Mario Juric wrote:
>
> Best Regards,
>
> Mario Juric
> Principal Engineer
> *UNSILO.ai* <http://unsilo.ai/>
> mobile:  +45 3082 4100
>
>     skype: mario.juric.dk <http://mario.juric.dk>
>
>
>
>
> Hi Marshall,
>
> I have a small test case  with 3 files excluding any JCasGen generated types
> and UIMAfit types file.
>
> First you will have to generate the types and run the SaveCompressedBinary to
> produce the 3 binaries forms I have been experimenting with. Yo should then be
> able to run LoadCompressedBinaries as expected.
>
> Next you need to change the element type of Container.features from
> FeatureAnnotation to FeatureRecord in the type system and generate the type
> system again. Also change the FeatureAnnotation reference In
> LoadCompressedBinaries l. 25 to FeatureRecord and then try to reload the
> previously stored binaries again without saving them first using the new type
> system.
>
> You can see I have played with different ways of loading just to see if
> anything worked, but much of it seems to result in exactly the same calls in
> the lower layers. I didn’t get entirely the same results with the CAS we
> actually store as in this example. E.g. I experienced some EOF with the
> compressed filtered whereas I only get a class cast exception during
> verification in this example. Note also that we keep both types in the new
> type system, but we want to change the element type of the FSArray in the
> Container.
>
> Hope this will yield some useful insights and thanks a lot :)
>
> Cheers
> Mario
>
>
>
>
>
>
>
>
>
>
>
>> On 13 Sep 2019, at 21:55 , Mario Juric <[email protected] 
>> <mailto:[email protected]>>
>> wrote:
>>
>> Thanks Marshall,
>>
>> I’ll get back to you with a small sample as soon I get the time to do it.
>> This will also get me a better understanding of the the format.
>>
>>
>> Cheers,
>> Mario
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On 13 Sep 2019, at 19:32 , Marshall Schor <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> I'm wondering if you could post a very small test case showing this problem 
>>> with
>>> a small type system. 
>>>
>>> With that, I could run in the debugger and see exactly what was happening, 
>>> and
>>> see whether or not some small fix would make this work.
>>>
>>> The Deserializer for this already supports a certain type of mismatch 
>>> between
>>> type systems, but mainly one where one is a subset of the other - see the
>>> javadoc for the method
>>>
>>> org.apache.uima.cas.impl.BinaryCasSerDes6.java.
>>>
>>> But it must not currently cover this particular case.
>>>
>>> -Marshall
>>>
>>> On 9/13/2019 10:48 AM, Mario Juric wrote:
>>>> Just a quick follow up.
>>>>
>>>> I played a bit around with the CasIOUtils, and it seems that it is possible
>>>> to load and use the embedded type system, i.e. the old type system with X,
>>>> but I found no way to replace it with the new type system and make the
>>>> necessary mappings to Y. I tried to see if I could use the CasCopier in a
>>>> separate step but it expectedly fails when it reaches to the FSArray of X
>>>> in the source CAS because the destination type system requires elements of
>>>> type Y. I could make my own modified version of the CasCopier that could
>>>> take some mapping functions for each pair of source and destination types
>>>> that need to be mapped, but this is where it starts to get too complicated,
>>>> so I found it not to be worth it at this point, since we might then want to
>>>> reprocess everything from scratch anyway.
>>>>
>>>> Cheers,
>>>> Mario
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> On 12 Sep 2019, at 10:41 , Mario Juric <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We use form 6 compressed binaries to persist the CAS. We now want to make
>>>>> a change to the type system that is not directly compatible, although in
>>>>> principle the new type system is really a subset from a data perspective,
>>>>> so we want to migrate existing binaries to the new type system, but we
>>>>> don’t know how. The change is as follows:
>>>>>
>>>>> In the existing type system we have a type A with a FSArray feature of
>>>>> element type X, and we want to change X to Y where Y contains a genuine
>>>>> feature subset of X. This means we basically want to replace X with Y for
>>>>> the FSArray and ditch a few attributes of X when loading the CAS into the
>>>>> new type system.
>>>>>
>>>>> Had the CAS been stored in JSON this would be trivial by just mapping the
>>>>> attributes that they have in common, but when I try to load the CAS binary
>>>>> into the new target type system it chokes with an EOF, so I don’t know if
>>>>> that is at all possible with a form 6 compressed CAS binary?
>>>>>
>>>>> I pocked a bit around in the reference, API and mailing list archive but I
>>>>> was not able to find anything useful. I can of course keep parallel
>>>>> attributes for both X and Y and then have a separate step that makes an
>>>>> explicit conversion/copy, but I prefer to avoid this. I would appreciate
>>>>> any input to the problem, thanks :)
>>>>>
>>>>> Cheers,
>>>>> Mario
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Reply via email to