Hey Hoss,
________________________________________
From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Monday, November 30, 2009 5:42 PM
To: solr-dev@lucene.apache.org
Subject: Re: SOLR-1131 - Multiple Fields per Field Type

It feels like something we've overlooked in this discussion is whether we
need to worry about any FieldType API changes needed to make these new
"PolyField" classes aware of when they are multivalued.

The API suggestions grant made gives the FieldTYpe the ability to return a
Filed[] from a single field value input -- but it doesn't provide any
information about wether that field value is one of many values we're
indexing for this field name.

Imagine that i want to make an index of people i know.  Each person also
has multiple locations where they can frequently be found (home, work,
gym, girlfriends house, favorite coffee shop, etc..).  My common case is
to search for people, not locations, so it doesn't make sense to flatten
out and have a doc for each person+location, i just want a single doc per
person, but htat means i need a "locations" field that's multivalued.

If i'm using a simple "LatLonFieldType" that splits my comma seperated
coordinate string into a "locations__LAT" and a "locations__LON" field
then  iassume it needs to do something special in the multiValued case to
make sure later "near" searches don't get confused and think that the lat
from my "work" and the lon from my "home" are actaully a third location.

how do we solve this?

I suppose we could just rely on mathing termPosition information, but that
means the FieldType needs a way to specify the Analyzer for all of the
field names it creates on the fly (another argument for reusing
dynamicFields i guess)

* or, alternatively, fieldTypes with configured pattern params *

to specify matching increments -- but that seems
somewhat brittle: what about complex PolyFieldTypes that want to create
variable number of Field's based on the input?

* This would seem to argue for smart FieldTypes that understand how their 
information is persisted (not just pattern parameters), but perhaps something 
that's difficult to codify in XML versus an actual P/L. Increments might be the 
only variant, but there may be more *

ie: as i recall, if you want to index coordinates of polygon bounding
boxes using cartisien grid fields, you need more field names for big
polygons then you do for small polygons -- so what if someone wants a
multivalued PolyField and indexes very big and very small polygons? ...
termPositions doens't seem like it really cuts it here.

* good food for thought -- I'll sleep on it tonight and see what I can think of 
to add to the discussion...*

Cheers,
Chris

Reply via email to