On Dec 1, 2009, at 1:42 AM, Chris Hostetter wrote:

> 
> It feels like something we've overlooked in this discussion is whether we 
> need to worry about any FieldType API changes needed to make these new 
> "PolyField" classes aware of when they are multivalued.
> 
> The API suggestions grant made gives the FieldTYpe the ability to return a 
> Filed[] from a single field value input -- but it doesn't provide any 
> information about wether that field value is one of many values we're 
> indexing for this field name.
> 
> Imagine that i want to make an index of people i know.  Each person also 
> has multiple locations where they can frequently be found (home, work, 
> gym, girlfriends house, favorite coffee shop, etc..).  My common case is 
> to search for people, not locations, so it doesn't make sense to flatten 
> out and have a doc for each person+location, i just want a single doc per 
> person, but htat means i need a "locations" field that's multivalued.
> 
> If i'm using a simple "LatLonFieldType" that splits my comma seperated 
> coordinate string into a "locations__LAT" and a "locations__LON" field 
> then  iassume it needs to do something special in the multiValued case to 
> make sure later "near" searches don't get confused and think that the lat 
> from my "work" and the lon from my "home" are actaully a third location.
> 
> how do we solve this?

I'm not sure if you worry about it.  But I'd argue it isn't natural anyway.  
You would do the following instead, which is how any address book I've ever 
seen works:
<field name="home" type="LatLonFT"/>
<field name="work" type="LatLonFT"/>

So, maybe the FT can explicitly prohibit multivalued?   But, I suppose you 
could do the position thing, too.  This could be achieved through a new 
SpanQuery pretty easily:  SpanPositionQuery that takes in a term and a specific 
position.  Trivial to write, I think, just not sure if it is generally useful.  
Although, I must say I've been noodling around with the idea with the notion of 
a "layered" field where variants of a primary token are stored at "sub 
positions" of the primary token (instead of in separate copy fields) and then 
one could write a query that says, for instance, search all of the "secondary" 
terms.  So, for instance, if you think of each position containing a stack of 
terms, then you could say use the terms at position two in the stack.  I'm not 
quite sure what this means just yet, but my thinking is that I could get a 
really compact index at the cost of a slightly more complex query.  It also 
means I would do some interesting things at query time that simply cannot be 
done across fields at the moment, for instance, create a phrase type query that 
used different layers where appropriate.

-Grant

Reply via email to