Adam Lally wrote:
On 12/30/06, Thilo Goetz <[EMAIL PROTECTED]> wrote:
<snip>
> (1) The XMI serialization could contain objects that are not in any
> view, nor referenced from anywhere.  Our implementation doesn't
> support that.  We'd lose such objects on a deserialization followed by
> serialization, and even if we didn't lose them, not providing any APIs
> to access them seems poor.

I don't see that we would loose such objects.  You just define an index
over all objects and there you are.  That also gives you an API to
access them.  And no, we should not provide such an index by default.
Many, if not most, applications won't need it.


Are you using "define an index" in the same sense as we currently do,
where someone has to call addToIndexes for something to be indexed?

Yes.

Or is this a new kind of index that contains everything no matter
what?  I think it would have to be the latter in order to support this
using the current XMI serialization format (which has no concept of
index membership separate from view membership).  Is that what you
intended?

No. I'm not sure there's a contradiction between what I'm proposing, and what's in the spec proposal. When I run an Apache UIMA application, I make the decision what I want to see in my CAS. Any other application deserializing XMI files may do the same. In Apache UIMA, we can just call addToIndex() on each FS we deserialize. It's the user's decision what they want to see, and what they're not interested in. Indexes are a basic concept in Apache UIMA, and just like we require FSs to be indexed to be visible to the next annotator, we can expect FSs to be indexed to visible to (de)serialization.

Surely all the spec requires is that all FSs _can_ be deserialized.

> (2) If we decided to add some kind of global indexes to our
> implementation (as was being recently discussed), that has no
> representation in the XMI serialization.  This seems like a problem to
> me.  How can we add things to our implementaiton that are supposed to
> be persistent across CAS serialization without opening up a discussion
> of what the serialization format looks like?

I didn't look at the details of the XMI proposal because frankly, I'm
not very interested in XML serialization.  The conceptual part of the
report does not contradict that approach, at least the way I read it.  I
probably missed something.  Where does it say you can't have global
indexes (or the OASIS equivalent thereof)?

The OASIS spec proposal only defines views.  In our implementation we
define indexes and say that we have an index repository per view and
that the members of the view are indexed in that index repository.  If
a "global index" means an index containing objects that are in no
view, then this approach no longer works.

What approach no longer works? The spec proposal or our implementation? The spec proposal says that a view is a set of FSs. It doesn't say (to my knowledge) that each FS must be contained in at least one view; nor does it say that it can be contained in only one view.

If the XMI part of the spec requires that the views partition the space of FSs (i.e., each FSs is contained in one and only one view), then that's a constraint I would like to discuss and propose to change.

I really think the XMI serialization is a key intersection point
between what we're doing in our CAS implementation and what OASIS is
charged with doing.  Each group looks at the XMI serialization in a
different way - for us developers, we may just want to throw in new
attributes or whatever to make our latest, greatest implementation
idea work.  The people in the OASIS group (at least some of them) look
at it as a realization of basic UIMA concepts.  That may sometimes
become an annoyance for us, since we might like to shape the XMI
serialization however we want, but I think ultimately it's a good
thing for UIMA.

Sure, but it leaves us with a lot of liberties. In the implementation, we can choose what kind of interfaces to the XMI form we offer to our users, and what extensions and conveniences we offer on top.


In the case of global indexes the issue with XMI serialization is that
there's no way to mark an object as being indexed without marking it
as being a member of a view.  Now, I suppose we could just define a
special view named "global" and be done with it.  But then, our APIs
would (should) be different than if we have special global indexes
that "belong to no view".

-Adam

Or, we could change the spec proposal.

--Thilo

Reply via email to