I agree with both of these concepts:  only GC'ing things which are not
in the index and also not reachable from something that is in the index,
and making GC'ing (mostly) automatic, based on thresholds, etc, when a
component exits back to the framework.  This would be fine for now - if
use cases come up where some more programmatic control of this is
needed, we could add something.

Maybe the next thing to focus on is the "contract" re: GC running.  For
a component (primitive or aggregate), the proposed contract is to have
the GC not change the FS "id"s that existed prior to the component
running.  This is a tradeoff - for more stability with existing handle
uses, versus less "aggressive" GC's.


Thilo Goetz wrote:
> Adam Lally wrote:
>> On Wed, Mar 11, 2009 at 8:53 AM, Marshall Schor <m...@schor.com> wrote:
>>> I agree in general about not making things more complicated at least to
>>> the user.  I can imagine education working for
>>>  1) things like string interning
>>>  2) things like deleting features from type systems where they're not
>>> being used, and where the annotator producing them will respect this.
>>> What this approach seems to miss are the following kinds of things:
>>> 1) cases where some set of annotators produce feature structures, which,
>>> after some point, are no longer needed, and are "deleted" but
>>> never-the-less continue to consume space.
>>> 2) cases where some set of annotators produce feature structures having
>>> lots of fields, where, after some point, the fields are no longer needed.
>>> If these are not significant use-cases in practice, then I'm happy to
>>> think-about / work-on other things :-).
>> I'd like to propose discussing the different ideas here one at a time.
>>  We had enough trouble coming to any agreement on GC the last time
>> that we discussed it, without also throwing string interning and
>> feature deleting into the mix.
>> So focusing on GC first (unless you think one of the others is more 
>> important):
>> My inclination is to assure that GC deletes only garbage, and that
>> there's no possibility that anything GC'ed could have been referenced
>> by anybody.  The other proposals that don't have this guarantee are
>> scary to me.
>> A way to accomplish this guarantee would be that when the process
>> method of an AnalysisEngine (could be either primitive or aggregate)
>> completes, we can mark as garbage any FS's that were created since the
>> beginning of that process method, but which are not referenced
>> directly or indirectly from anything in the indexes.  Does this
>> concept seem reasonable?
> +1. I like the idea because it is sort of local on the one
> hand, but still allows one to delete FSs from indexes
> later in the processing and have them garbage collected
> (on exiting the containing aggregate).
>> The next question is under what conditions would a GC execute.
>> Requiring an explicit call seems counter to what other garbage
>> collecting runtime environments do, and like Thilo I'm confused about
>> who would call this and when.  I think it would be better to define
>> the parameters that control GC in the PerformanceTuningSettings that
>> we already have, and make them dependent on how much CAS heap space is
>> used relative to a GC threshold that the user has set in the
>> PerformanceTuningSettings.
> +1, and the default could be "no GC", so it would be
> perfectly backwards compatible.  I'm thinking of the
> kinds of scenarios that I often work with, where
> basically all the annotations are later written to
> an index, and any attempt at GC would be futile and
> just consume time to no benefit.
>>  -Adam

