Chris M. Hostetter created SOLR-17052:
-----------------------------------------

             Summary: SchemaCodecFactory/IndexSchema/FieldType relationships 
are kludgy and should be inverted
                 Key: SOLR-17052
                 URL: https://issues.apache.org/jira/browse/SOLR-17052
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Chris M. Hostetter


While getting familiar with the {{SolreCore + CodecFactory + SchemaCodecFactory 
+ FieldType}} related code relevant to SOLR-17045, SOLR-17046, & SOLR-17047 It 
occurred to me that there is a lot of ineffeciencies and kludginess to how 
{{FieldType}} based "codec overrides" are used (and validated) by 
{{SchemaCodecFactory}} (and {{{}SolrCore.initCodec{}}}) :
 * {{SolrCore.initCodec}} needs to be aware of all the possible ways a 
{{FieldType}} instance might support codec overrides
 ** ... so it can fail if any are specified unless the {{CodecFactory 
instanceOf SolrCoreAware}}
 *** ... even though that still doesn't ensure the factory supports those field 
type overrides
 ** This validation currently just looks at {{getPostingsFormatForField}} & 
{{getDocValuesFormatForField}}
 *** ... it's ignorant about {{DenseVectorField}} 's assumptions about being 
able to override aspects of the {{KnnVectorsFormat}}
 *** ... and AFAICT, what validation is don't doesn't help if the Schema API is 
used to add new field types (w/ {{postingsFormat}} or {{docValuesFormat}} 
overrides)
 * in all of the the {{SchemaCodecFactory}} "per-field" methods 
({{{}getPostingsFormatForField{}}}, {{{}getDocValuesFormatForField{}}}, & 
{{{}getKnnVectorsFormatForField{}}}) ...
 ** ... every call to these methods resolves a {{SchemaField}} instance – even 
though only the (Solr) {{FieldType}} is needed
 *** Asking the {{IndexSchema}} for the {{SchemaField}} of a fieldName has more 
overhead then just asking for the {{FieldType}}
 *** None of the things these methods care about can be configured on a 
per-fieldName bassis anyway.
 ** For {{PostingsFormat}} and {{{}DocValuesFormat{}}}, every call to these 
methods repeats the SPI lookup on the "format name" configured on the 
{{FieldType}} instance
 ** For {{KnnVectorsFormat}} every call to this method constructs a new 
{{SolrDelegatingKnnVectorsFormat}} – even though the same instance could be 
re-used for every field of the same {{FieldType}} instance.
 * In {{FieldType}} ...
 ** ... there is no validation anywhere that the {{postingsFormat}} or 
{{docValuesFormat}} are valid
 *** ... bogus values only cause a problem when the {{SchemaCodecFactory}} 
tries to resolve them (when indexing)
 * In {{DenseVectorField}} ...
 ** ... {{checkSchemaField}} validates (and logs warnings) based on the 
{{vectorEncoding}} and {{{}dimensions{}}}...
 *** ... Even though these validations aren't "field" specific – they are 
"type" specific, and could be validated in {{DenseVectorField.init()}}
 ** BUT! ... there is no validation anywhere that the {{knnAlgorithm}} is 
supported, or that the HNSW options make sense for it
 *** These are only validated by the {{Codec.getKnnVectorsFormatForField(...)}} 
impl provided by {{SchemaCodecFactory}} ...
 **** ... and they are redundenly validated on every call




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to