Hey Devs, (Sorry for the length, but this patch has a lot of touch points and I think it needs discussion. I think it really can open up some new possibilities in Solr, too)
I'm working on https://issues.apache.org/jira/browse/SOLR-1131. I was able to take Ryan's patch and make some progress on the indexing side. Now I'm working on the search side and I have some questions on how people think this should work. I've also updated the Schema, SchemaField and FieldType so that one can quickly identify what I am calling a "polyfield" via a variety of mechanisms (IndexSchema.isPolyField(fieldName), FieldType.isPolyField(), SchemaField.isPolyField) As background, SOLR-1131 allows a FieldType to specify that multiple Fields may be created per field type such that the user of Solr is agnostic of the underlying implementation. This is useful for spatial search, amongst other things. As a simple proof of concept, imagine that I define a new FieldType called PlusMinusIntFieldType that extends IntField. This FieldType takes in an int value and outputs two Fields: one with the original value and one with the negative of the value. The schema decl might look like: <fieldType name="plusMinus" class="solr.PlusMinusIntField"/> And the field declaration is just as any field declaration: <fieldType name="plusMinus" class="solr.PlusMinusIntField"/> (Not sure what it would mean to have it be a dynamic field just yet, but it should just work) The code for this might look like: @Override public Field[] createFields(SchemaField field, String externalVal, float boost) { int theInt = Integer.parseInt(externalVal); int negInt = -theInt; return createFields(field, boost, String.valueOf(theInt), String.valueOf(negInt)); } The signature for the invoked createFields is: protected Field[] createFields(SchemaField field, float boost, String ... externalVals){ In this case, createFields produces two fields, one for each value. Right now, I am autogenerating the name for the new field as field.getName() + __ + i, where i is based on the count in interating over the externalVals array. I think that all works pretty well, but it has some ramifications. The only question I have is whether it is worthwhile to consider allowing the user to specify the separator in naming the new fields, or whether __ is sufficiently magical and won't likely cause conflicts. OK, on the search side is where it gets tricky. The whole point of this exercise is that the details are hidden from the user in the generic case. Thus, a query of plusMinus:5 should automatically expand to (plusMinus__0:5 OR plusMinus__1:-5). Of course, an expert user should still be able to query a specific field, too, as that can be useful. The problem is, how does the user know the magical name and how do they know the semantics of that field? Perhaps an alternative is for the FieldType to specify the list of appendages for the name, such that the FieldType above would output plusMinus__pos and plusMinus__neg and then the user can use the Luke admin handler to figure it out plus some common sense? Thoughts? FWIW, I think I see how to handle the generic case in the Query Parser. I hope to have a patch up today. Once we have this, a lot of the spatial stuff just flows from it. I don't know whether these new FieldTypes should support sorting or not. I haven't even thought yet about what it means to facet on one of these FieldTypes just yet. -Grant