Ishan Sri created LUCENE-8830:
---------------------------------

             Summary: DefaultIndexingChain.getOrAddField method ignores 
omitNorms from FieldType
                 Key: LUCENE-8830
                 URL: https://issues.apache.org/jira/browse/LUCENE-8830
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 6.6.1
            Reporter: Ishan Sri


Norms are being computed and written even when *omitNorms is set to true* in 
the fieldTypes. I chased the issue and found that the method *getOrAddField* 
tries to create a *FieldInfo* object in the 1st pass. By default this object 
has omitNorms to false. The method sets the *indexOptions* as specified in the 
fieldType on this newly created object but doesn't do the same for *omitNorms.* 
This effectively overrides this flag which creates issues down the line. 
 
Here's the code snippet for the method with the *fieldInfos.getOrAdd* call 
 
 
{code:java}
private PerField getOrAddField(String name, IndexableFieldType fieldType, 
boolean invert) {

 // Make sure we have a PerField allocated
 final int hashPos = name.hashCode() & hashMask;
 PerField fp = fieldHash[hashPos];
 while (fp != null && !fp.fieldInfo.name.equals(name)) {
 fp = fp.next;
 }

 if (fp == null) {
 // First time we are seeing this field in this segment

 FieldInfo fi = fieldInfos.getOrAdd(name);

// Messy: must set this here because e.g. FreqProxTermsWriterPerField looks at 
the // initial IndexOptions to decide what arrays it must create). Then, we 
also must // set it in PerField.invert to allow for later downgrading of the 
index options:

 fi.setIndexOptions(fieldType.indexOptions());

 fp = new PerField(fi, invert);
 ...   {code}
 
 
 
The *getOrAdd* method below instantiates a new object with omitNorms set to 
false as the 4th parameter.
 
{code:java}
/** Create a new field, or return existing one. */
public FieldInfo getOrAdd(String name) {
 FieldInfo fi = fieldInfo(name);
 
if (fi == null) {
 // This field wasn't yet added to this in-RAM
 // segment's FieldInfo, so now we get a global
 // number for this field. If the field was seen
 // before then we'll get the same name and number,
 // else we'll allocate a new one:

 final int fieldNumber = globalFieldNumbers.addOrGet(name, -1, 
DocValuesType.NONE, 0, 0);
 
fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE, 
DocValuesType.NONE, -1, new HashMap<>(), 0, 0);

 assert !byName.containsKey(fi.name);
 globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name, 
DocValuesType.NONE);
 byName.put(fi.name, fi);
 }

 return fi;
}{code}
 
This will cause norms to always be computed which not only produces incorrect 
scores but also impacts the disk usage if there are many documents with 
multiple fields which have this flag set to true but ignored



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to