And this is also an approach Yonik drafted here for user/tagging
design: http://wiki.apache.org/solr/UserTagDesign
Erik
On Dec 4, 2009, at 1:35 PM, Steven A Rowe wrote:
Hi Grant,
On 12/02/2009 at 2:30 PM, Grant Ingersoll wrote:
I've been noodling around with the idea with the notion of a
"layered" field where variants of a primary token are stored at
"sub positions" of the primary token (instead of in separate copy
fields)
The Indri search engine (now part of Lemur) uses a similar idea:
fields are implemented as potentially overlapping extents over the
(single) stream of document tokens. (Howard Turtle, who is now the
CNLP director, and has been involved in Indri development, told me
about this feature. He says it allows for natural representation of
fields projected onto hierarchical data, e.g. XML.) I wasn't able
to find much documentation about this online when I looked just now,
but here's a high-level overview of the Indri "repository" (aka
index) structure:
http://www.lemurproject.org/docs/index.php/Indri_Repository_Structure
(See the "Field Information Files" section near the bottom.)
Steve