Re: Ranking based on term position

Li Li Mon, 19 Jul 2010 19:43:13 -0700

I have considerd this problem and tried to solve it using 2 methods
By these methods, we also can boost a doc by the relative positions of
query terms.


1: add term Position when indexing
   modify TermScorer.score

  public float score() {
    assert doc != -1;
    int f = freqs[pointer];
    float raw =                                   // compute tf(f)*weight
      f < SCORE_CACHE_SIZE                        // check cache
      ? scoreCache[f]                             // cache hit
      : getSimilarity().tf(f)*weightValue;        // cache miss
    //modified by LiLi
    try {
                int[] positions=this.getPositions(f);
                float positionBoost=1.0f;
                for(int pos:positions){
                        positionBoost*=this.getPositionBoost(pos);
                }
                raw*=positionBoost;
        } catch (IOException e) {
        }
    //modified
    return norms == null ? raw : raw * SIM_NORM_DECODER[norms[doc] &
0xFF]; // normalize for field
  }


  private int[] getPositions(int f) throws IOException{
          termPositions.skipTo(doc);
          int[] positions=new int[f];
          int docId = termPositions.doc();
          assert docId==doc;
          int tf=termPositions.freq();
          assert tf==f;
          for(int i=0;i<tf;i++){
                  positions[i]=termPositions.nextPosition();
          }
          return positions;
  }

Then you must pass a TermPositions
termPositions=reader.termPositions(term); to it.  I modified this
construction of TermScorer to add this param.

2. use payload
   I tried to use payload to save whether a term occured in first 128
positions by a bitset. This method save more space than first one.
  Then Using my Similarity:
        public float scorePayload(int docID, String fieldName, int start, int 
end,
                        byte[] payload, int offset, int length) {
                if (payload != null) {
                        float boost = 1.0F;
                        int firstOccur=PayloadHelper.decodeInt(payload, 0);
                        BitSet bitSet=MyAnalyzer.fromByteArray(payload, 
4,length-4);
                        for(int i=0;i<bitSet.length();i++){
                                if(bitSet.get(i)){
                                        boost*=positionBoost[i];
                                }
                        }
                        return boost;
                } else {
                        return 1.0F;
                }
        }


2010/7/20 Papiya Misra <pmi...@pinkotc.com>:
> I need to make sure that documents with the search term occurring
> towards the beginning of the document are ranked higher.
>
> For example,
>
> Search term : ox
> Doc 1: box fox ox
> Doc 2: ox box fox
>
> Result: Doc2 will be ranked higher than Doc1.
>
> The solution I can think of is sorting by term position (after enabling
> term vectors). Is that the best way to go about it ?
>
> Thanks
> Papiya
>
>
> Pink OTC Markets Inc. provides the leading inter-dealer quotation and
> trading system in the over-the-counter (OTC) securities market.   We create
> innovative technology and data solutions to efficiently connect market
> participants, improve price discovery, increase issuer disclosure, and
> better inform investors.   Our marketplace, comprised of the issuer-listed
> OTCQX and broker-quoted   Pink Sheets, is the third largest U.S. equity
> trading venue for company shares.
>
> This document contains confidential information of Pink OTC Markets and is
> only intended for the recipient.   Do not copy, reproduce (electronically or
> otherwise), or disclose without the prior written consent of Pink OTC
> Markets.      If you receive this message in error, please destroy all
> copies in your possession (electronically or otherwise) and contact the
> sender above.
>

Re: Ranking based on term position

Reply via email to