Hi, On 9/10/07, kbennett <[EMAIL PROTECTED]> wrote: > It seems to me that options going into the parser are logically different > from metadata coming out of the parser, and that to maximize the code's > cohesion (see http://en.wikipedia.org/wiki/Cohesion_%28computer_science%29), > it would be preferable to express them as two different objects.
There are really two kinds of options that could affect the way a parser would work. The first kind are generic options like the maximum amount of memory or time to use, the location of any temporary files to be used, etc. that don't have any direct relation to the specific document being parsed. The other kind are parsing hints related to the parsed document, like the name (and extension) of the file that contains the document, any MIME headers associated with the document (for example from a HTTP request or an email body part), etc. The first kind of options I'd really handle separately as JavaBean properties or some such of the parser instances, but the second kind is actually more or less accurate metadata about the document in question, so IMHO it would make perfect sense to pass that information as a part of the metadata argument. > Also, if the metadata is the only output of the parser (as it appears to be > in the use case), why not have the parser create the metadata object itself, > and return it as the return value? This would seem like a more natural > interface. As mentioned above, I think the metadata object could (and should) be used to pass various parsing "hints" to the parser, and that the parser can then extend, verify, or correct the given metadata. This approach also allows one to have a sequence of parsers that incrementally extract more and more information from the input document. BR, Jukka Zitting
