Hi Mario,
> On 21. Oct 2020, at 21:26, Mario Juric <[email protected]> wrote:
>
> We never had problems migrating from one type system as long the types where
> either extended or something was deleted. The problem we had was when an
> attribute changed type, e.g. a change from a simple FSArray to a wrapper type
> with the custom java object and a FSArray. We tried something similar last
> year where a type A had an FSArray attribute with elements of another type B
> that previously inherited from Annotation, and we changed that to inherit
> from TOP instead, while all of the attributes of B, that we had declared,
> remained unchanged. Not surprisingly the deserialiser couldn’t load the old
> CAS leniently with this change, and we never figured out how to do a
> conversion, if that is at all possible, since A can only take one form, i.e.
> we haven’t figured out how to have two versions of A simultaneously in order
> to make a conversion. Maybe there are some lower level CAS possibilities that
> we are not aware of yet. The problem should be the same when changing the
> type of an attribute from FSArray to a wrapper type with custom java objects.
Ok, I think I get the picture now. I was imagining to create a new type that
would replace the old and basically copying the data over into the new
structure. You are thinking of basically modifying a type "in-place".
I think this is doable in the following way:
1) create a CAS "oldCas" with your existing type system
CAS oldCas = CasFactory.createCas(
TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("old_typesystem.xml");
2) create a CAS "newCas" with your new type system
CAS newCas = CasFactory.createCas(
TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("new_typesystem.xml");
3) implement a method taking two CASes and coping the data from one to the
other while
massaging relevant feature structures according to the changes in the type
system
void copyAndUpgradeCas(CAS oldCas, CAS newCas) {
// Recursively collect all accessible feature structures in oldCas
// for each feature structure, create a copy in newCas
// If the feature structure is of a type which changed, copy data according
to the changes
// otherwise, copy it 1-to-1 (or at least the primitive values)
// collect a reference which old FS was mapped to which new FS which can be
used to connect
// FS references in a second pass
// in a second pass copy/convert the FS references (i.e. non-primitive
features)
// Optionally repeat the process for other views in the CAS
}
(Basically step 3 is in a sense CasCopier - just a custom one where you apply a
data transformation
instead of just copying the data.)
Important for this to work is that you are using the CAS API and stay away from
the JCas API!
If you had XMI data instead of binary CASes, I would have suggested that DKPro
Cassis might be a route to explore. With this library, you can load XMI CAS
objects into Python and Python objects are notoriously flexible and malleable -
much more so than CAS / JCas objects. I didn't dig into it, but I could imagine
that a CAS and type system loaded using DKPro Cassis could be monkey-patched
in-place into a new structure. But then again, I haven't tried using Cassis for
this purpose but I am quite confident that
the Java-based approach I outlined above should be doable.
Cheerio,
-- Richard