Hi Jason,

Here's what I did:

1. Took your code and modified it to be that of [1] below
2. Set up your config, schema, etc. as per the EmbeddedSolrServer paths in the code (a Maven like dir structure w/ src/main/resources/ solr/spell containing your configuration.
3. Ran the code.  My output is:
--------------------
Token: chanel OMP: false
Oct 8, 2008 1:19:56 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description %3Achanel &spellcheck = true &spellcheck .onlyMorePopular =false&spellcheck.extendedResults=true&spellcheck.count=1} hits=834 status=0 QTime=46
No Suggestions
--------------------
Token: chane OMP: false
Oct 8, 2008 1:19:56 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description %3Achane &spellcheck = true &spellcheck .onlyMorePopular =false&spellcheck.extendedResults=true&spellcheck.count=1} hits=1 status=0 QTime=1
No Suggestions
--------------------
Token: chane OMP: true
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description %3Achane &spellcheck = true &spellcheck .onlyMorePopular =true&spellcheck.extendedResults=true&spellcheck.count=1} hits=1 status=0 QTime=15
        Sugg[0]: [chanel]
        Sugg[0] Freqs: [834]
        Num Found 1
--------------------
Token: chanl OMP: false
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description %3Achanl &spellcheck = true &spellcheck .onlyMorePopular =false&spellcheck.extendedResults=true&spellcheck.count=1} hits=0 status=0 QTime=2
        Sugg[0]: [chanel]
        Sugg[0] Freqs: [834]
        Num Found 1
--------------------
Token: chanl OMP: false
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description %3Achanl &spellcheck = true &spellcheck .onlyMorePopular =false&spellcheck.extendedResults=true&spellcheck.count=5} hits=0 status=0 QTime=2
        Sugg[0]: [chanel, chant, chang, chani, chane]
        Sugg[0] Freqs: [834, 10, 8, 4, 1]
        Num Found 5
--------------------
Token: chanl OMP: false
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description %3Achanl &spellcheck = true &spellcheck .onlyMorePopular =false&spellcheck.extendedResults=true&spellcheck.count=10} hits=0 status=0 QTime=2 Sugg[0]: [chanel, chant, chang, chani, chana, chane, charl, chand, chan, chair]
        Sugg[0] Freqs: [834, 10, 8, 4, 1, 1, 1, 1, 106, 1950]
        Num Found 10

------

1)  Is this an accurate representation of what you are trying to convey?
2) In light of this shared code that I hope captures both the document side and the query side, is the issue than highlighted by the last result above, namely, that "chan" sorts after "chand" even though "chan" has a higher frequency?

Thanks,
Grant


[1]
package com.grantingersoll.noodles;

import junit.framework.TestCase;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse;
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.params.SpellingParams;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.SolrCore;
import org.apache.solr.handler.component.SpellCheckComponent;

import java.util.ArrayList;
import java.util.List;
import java.util.Collection;
import java.util.HashSet;
import java.io.File;


/**
 *
 *
 **/
public class SpellCheckingTest extends TestCase {


  public void testSpelling() throws Exception {
List<Pair<String, Integer>> terms = new ArrayList<Pair<String, Integer>>();
    terms.add(new Pair<String, Integer>("chanel", 834));
    terms.add(new Pair<String, Integer>("chant", 10));
    terms.add(new Pair<String, Integer>("chang", 8));
    terms.add(new Pair<String, Integer>("chani", 4));
    terms.add(new Pair<String, Integer>("chand", 1));
    terms.add(new Pair<String, Integer>("chana", 1));
    terms.add(new Pair<String, Integer>("charl", 1));
    terms.add(new Pair<String, Integer>("chane", 1));
    terms.add(new Pair<String, Integer>("chan", 106));
    terms.add(new Pair<String, Integer>("chair", 1950));
    int id = 0;
CoreContainer container = new CoreContainer("src/main/resources/ solr", new File("src/main/resources/solr/solr.xml"));
    //container.load();
    //SolrCore core = container.create(descriptor);
final SolrServer client = new EmbeddedSolrServer(container, "spell");
    //client.setParser(new XMLResponseParser());
Collection<SolrInputDocument> docs = new HashSet<SolrInputDocument>();
    for (Pair<String, Integer> term : terms) {
      final int freq = term.getSecond().intValue();
      for (int i = 0; i < freq; ++i) {
        SolrInputDocument doc = new SolrInputDocument();
        doc.addField("id", String.valueOf(++id));
        doc.addField("description", term.getFirst());
        docs.add(doc);
      }
    }
    client.add(docs);
    client.optimize();

    //buildSpellCheck(client);

    spellCheck(client, "chanel", false, 1);
    spellCheck(client, "chane", false, 1);
    spellCheck(client, "chane", true, 1);
    spellCheck(client, "chanl", false, 1);
    spellCheck(client, "chanl", false, 5);
    spellCheck(client, "chanl", false, 10);

  }

private void spellCheck(SolrServer client, String token, boolean omp, int numSuggs) throws SolrServerException {
    System.out.println("--------------------");
    System.out.println("Token: " + token + " OMP: " + omp);
    SolrQuery query;
    QueryResponse rsp;
    SpellCheckResponse spRsp;
    query = new SolrQuery("description:" + token);
    query.set(SpellCheckComponent.COMPONENT_NAME, "true");
query.set(SpellingParams.SPELLCHECK_ONLY_MORE_POPULAR, String.valueOf(omp));
    query.set(SpellingParams.SPELLCHECK_EXTENDED_RESULTS, "true");
query.set(SpellingParams.SPELLCHECK_COUNT, String.valueOf(numSuggs));
    //query.setQueryType("dismax");
    rsp = client.query(query);
    spRsp = rsp.getSpellCheckResponse();

    //System.out.println("Response: " + rsp);
List<SpellCheckResponse.Suggestion> suggestions = spRsp.getSuggestions();
    //System.out.println("Spelling: " + suggestions);
    printSuggestions(suggestions);
  }

private void printSuggestions(List<SpellCheckResponse.Suggestion> suggestions) {
    int i = 0;
    if (suggestions.isEmpty() == false) {
      for (SpellCheckResponse.Suggestion sugg : suggestions) {


System.out.println("\tSugg[" + i + "]: " + sugg.getSuggestions()); System.out.println("\tSugg[" + i + "] Freqs: " + sugg.getSuggestionFrequencies());
        System.out.println("\tNum Found " + sugg.getNumFound());
      }
    } else {
      System.out.println("No Suggestions");
    }
  }


}

class Pair<S, T> {

  S first;

  T second;

  public Pair(S _first, T _second) {
    this.first = _first;
    this.second = _second;
  }

  public S getFirst() {
    return this.first;
  }

  public T getSecond() {
    return this.second;

  }

}


On Oct 8, 2008, at 10:22 AM, Jason Rennie wrote:

Hi Grant,

Here are solr config files (attached) and java code (included below) to recreate the test case.

Jason

List<Pair<String, Integer>> terms = new ArrayList<Pair<String, Integer>>();
        terms.add(new Pair<String, Integer>("chanel", 834));
        terms.add(new Pair<String, Integer>("chant", 10));
        terms.add(new Pair<String, Integer>("chang", 8));
        terms.add(new Pair<String, Integer>("chani", 4));
        terms.add(new Pair<String, Integer>("chand", 1));
        terms.add(new Pair<String, Integer>("chana", 1));
        terms.add(new Pair<String, Integer>("charl", 1));
        terms.add(new Pair<String, Integer>("chane", 1));
        terms.add(new Pair<String, Integer>("chan", 106));
        terms.add(new Pair<String, Integer>("chair", 1950));
        int id = 0;
final CommonsHttpSolrServer client = new CommonsHttpSolrServer("http://solr:8080/solr/";);
        client.setParser(new XMLResponseParser());
        for (Pair<String, Integer> term : terms) {
            final int freq = term.getSecond().intValue();
            for (int i = 0; i < freq; ++i) {
                SolrInputDocument doc = new SolrInputDocument();
                doc.addField("id", String.valueOf(++id));
                doc.addField("description", term.getFirst());
                client.add(doc);
            }
        }
        client.optimize();

Here's a Pair class:

public class Pair<S, T> {

    S first;

    T second;

    public Pair(S _first, T _second) {
        this.first = _first;
        this.second = _second;
    }

    public S getFirst() {
        return this.first;
    }

    public T getSecond() {
        return this.second;

    }

}

<solrconfig.xml><schema.xml>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ








Reply via email to