I have considerd this problem and tried to solve it using 2 methods By these methods, we also can boost a doc by the relative positions of query terms.
1: add term Position when indexing modify TermScorer.score public float score() { assert doc != -1; int f = freqs[pointer]; float raw = // compute tf(f)*weight f < SCORE_CACHE_SIZE // check cache ? scoreCache[f] // cache hit : getSimilarity().tf(f)*weightValue; // cache miss //modified by LiLi try { int[] positions=this.getPositions(f); float positionBoost=1.0f; for(int pos:positions){ positionBoost*=this.getPositionBoost(pos); } raw*=positionBoost; } catch (IOException e) { } //modified return norms == null ? raw : raw * SIM_NORM_DECODER[norms[doc] & 0xFF]; // normalize for field } private int[] getPositions(int f) throws IOException{ termPositions.skipTo(doc); int[] positions=new int[f]; int docId = termPositions.doc(); assert docId==doc; int tf=termPositions.freq(); assert tf==f; for(int i=0;i<tf;i++){ positions[i]=termPositions.nextPosition(); } return positions; } Then you must pass a TermPositions termPositions=reader.termPositions(term); to it. I modified this construction of TermScorer to add this param. 2. use payload I tried to use payload to save whether a term occured in first 128 positions by a bitset. This method save more space than first one. Then Using my Similarity: public float scorePayload(int docID, String fieldName, int start, int end, byte[] payload, int offset, int length) { if (payload != null) { float boost = 1.0F; int firstOccur=PayloadHelper.decodeInt(payload, 0); BitSet bitSet=MyAnalyzer.fromByteArray(payload, 4,length-4); for(int i=0;i<bitSet.length();i++){ if(bitSet.get(i)){ boost*=positionBoost[i]; } } return boost; } else { return 1.0F; } } 2010/7/20 Papiya Misra <pmi...@pinkotc.com>: > I need to make sure that documents with the search term occurring > towards the beginning of the document are ranked higher. > > For example, > > Search term : ox > Doc 1: box fox ox > Doc 2: ox box fox > > Result: Doc2 will be ranked higher than Doc1. > > The solution I can think of is sorting by term position (after enabling > term vectors). Is that the best way to go about it ? > > Thanks > Papiya > > > Pink OTC Markets Inc. provides the leading inter-dealer quotation and > trading system in the over-the-counter (OTC) securities market. We create > innovative technology and data solutions to efficiently connect market > participants, improve price discovery, increase issuer disclosure, and > better inform investors. Our marketplace, comprised of the issuer-listed > OTCQX and broker-quoted Pink Sheets, is the third largest U.S. equity > trading venue for company shares. > > This document contains confidential information of Pink OTC Markets and is > only intended for the recipient. Do not copy, reproduce (electronically or > otherwise), or disclose without the prior written consent of Pink OTC > Markets. If you receive this message in error, please destroy all > copies in your possession (electronically or otherwise) and contact the > sender above. >