There is more to consider here. Lucene now supports "payloads",
additional metadata on terms that can be leveraged with custom
queries. I've not yet tinkered with them myself, but my understanding
is that they would be useful (and in fact designed in part) for
representing structured documents. It would behoove us to investigate
how payloads might be leveraged for your needs here, such that a
single field could represent an entire document, with payloads
representing the hierarchical structure. This will require
specialized Analyzer and Query subclasses be created to take advantage
of payloads. The Lucene community itself is just now starting to
exploit this new feature, so there isn't a lot out there on it yet,
but I think it holds great promise for these purposes.
Erik
Hello Erik,
Could you elaborate on how payloads could be used to represent a
structured doc?
Thanks, Brian