Chris M. Hostetter created SOLR-17052: -----------------------------------------
Summary: SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy and should be inverted Key: SOLR-17052 URL: https://issues.apache.org/jira/browse/SOLR-17052 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Chris M. Hostetter While getting familiar with the {{SolreCore + CodecFactory + SchemaCodecFactory + FieldType}} related code relevant to SOLR-17045, SOLR-17046, & SOLR-17047 It occurred to me that there is a lot of ineffeciencies and kludginess to how {{FieldType}} based "codec overrides" are used (and validated) by {{SchemaCodecFactory}} (and {{{}SolrCore.initCodec{}}}) : * {{SolrCore.initCodec}} needs to be aware of all the possible ways a {{FieldType}} instance might support codec overrides ** ... so it can fail if any are specified unless the {{CodecFactory instanceOf SolrCoreAware}} *** ... even though that still doesn't ensure the factory supports those field type overrides ** This validation currently just looks at {{getPostingsFormatForField}} & {{getDocValuesFormatForField}} *** ... it's ignorant about {{DenseVectorField}} 's assumptions about being able to override aspects of the {{KnnVectorsFormat}} *** ... and AFAICT, what validation is don't doesn't help if the Schema API is used to add new field types (w/ {{postingsFormat}} or {{docValuesFormat}} overrides) * in all of the the {{SchemaCodecFactory}} "per-field" methods ({{{}getPostingsFormatForField{}}}, {{{}getDocValuesFormatForField{}}}, & {{{}getKnnVectorsFormatForField{}}}) ... ** ... every call to these methods resolves a {{SchemaField}} instance – even though only the (Solr) {{FieldType}} is needed *** Asking the {{IndexSchema}} for the {{SchemaField}} of a fieldName has more overhead then just asking for the {{FieldType}} *** None of the things these methods care about can be configured on a per-fieldName bassis anyway. ** For {{PostingsFormat}} and {{{}DocValuesFormat{}}}, every call to these methods repeats the SPI lookup on the "format name" configured on the {{FieldType}} instance ** For {{KnnVectorsFormat}} every call to this method constructs a new {{SolrDelegatingKnnVectorsFormat}} – even though the same instance could be re-used for every field of the same {{FieldType}} instance. * In {{FieldType}} ... ** ... there is no validation anywhere that the {{postingsFormat}} or {{docValuesFormat}} are valid *** ... bogus values only cause a problem when the {{SchemaCodecFactory}} tries to resolve them (when indexing) * In {{DenseVectorField}} ... ** ... {{checkSchemaField}} validates (and logs warnings) based on the {{vectorEncoding}} and {{{}dimensions{}}}... *** ... Even though these validations aren't "field" specific – they are "type" specific, and could be validated in {{DenseVectorField.init()}} ** BUT! ... there is no validation anywhere that the {{knnAlgorithm}} is supported, or that the HNSW options make sense for it *** These are only validated by the {{Codec.getKnnVectorsFormatForField(...)}} impl provided by {{SchemaCodecFactory}} ... **** ... and they are redundenly validated on every call -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org