SOLR-1131 - Multiple Fields per Field Type

Grant Ingersoll Tue, 24 Nov 2009 06:16:39 -0800

Hey Devs,

(Sorry for the length, but this patch has a lot of touch points and I think it 
needs discussion.  I think it really can open up some new possibilities in 
Solr, too)


I'm working on https://issues.apache.org/jira/browse/SOLR-1131.  I was able to 
take Ryan's patch and make some progress on the indexing side.  Now I'm working 
on the search side and I have some questions on how people think this should 
work.  I've also updated the Schema, SchemaField and FieldType so that one can 
quickly identify what I am calling a "polyfield" via a variety of mechanisms 
(IndexSchema.isPolyField(fieldName), FieldType.isPolyField(), 
SchemaField.isPolyField)

As background, SOLR-1131 allows a FieldType to specify that multiple Fields may 
be created per field type such that the user of Solr is agnostic of the 
underlying implementation.  This is useful for spatial search, amongst other 
things.  

As a simple proof of concept, imagine that I define a new FieldType called 
PlusMinusIntFieldType that extends IntField.   This FieldType takes in an int 
value and outputs two Fields: one with the original value and one with the 
negative of the value.  

The schema decl might look like:
<fieldType name="plusMinus" class="solr.PlusMinusIntField"/>

And the field declaration is just as any field declaration:
<fieldType name="plusMinus" class="solr.PlusMinusIntField"/>

(Not sure what it would mean to have it be a dynamic field just yet, but it 
should just work)

The code for this might look like:

@Override
  public Field[] createFields(SchemaField field, String externalVal, float 
boost) {
    int theInt = Integer.parseInt(externalVal);
    int negInt = -theInt;
    return createFields(field, boost, String.valueOf(theInt), 
String.valueOf(negInt));
  }

The signature for the invoked createFields is: protected Field[] 
createFields(SchemaField field, float boost, String ... externalVals){

In this case, createFields produces two fields, one for each value.  Right now, 
I am autogenerating the name for the new field as field.getName() + __ + i, 
where i is based on the count in interating over the externalVals array.

I think that all works pretty well, but it has some ramifications.  The only 
question I have is whether it is worthwhile to consider allowing the user to 
specify the separator in naming the new fields, or whether __ is sufficiently 
magical and won't likely cause conflicts.

OK, on the search side is where it gets tricky.  The whole point of this 
exercise is that the details are hidden from the user in the generic case.  
Thus, a query of plusMinus:5 should automatically expand to (plusMinus__0:5 OR 
plusMinus__1:-5).  Of course, an expert user should still be able to query a 
specific field, too, as that can be useful.  The problem is, how does the user 
know the magical name and how do they know the semantics of that field?  
Perhaps an alternative is for the FieldType to specify the list of appendages 
for the name, such that the FieldType above would output plusMinus__pos and 
plusMinus__neg and then the user can use the Luke admin handler to figure it 
out plus some common sense?  Thoughts?

FWIW, I think I see how to handle the generic case in the Query Parser.  I hope 
to have a patch up today.  Once we have this, a lot of the spatial stuff just 
flows from it.

I don't know whether these new FieldTypes should support sorting or not.

I haven't even thought yet about what it means to facet on one of these 
FieldTypes just yet.

-Grant

SOLR-1131 - Multiple Fields per Field Type

Reply via email to