Hi Jason,
Here's what I did:
1. Took your code and modified it to be that of [1] below
2. Set up your config, schema, etc. as per the EmbeddedSolrServer
paths in the code (a Maven like dir structure w/ src/main/resources/
solr/spell containing your configuration.
3. Ran the code. My output is:
--------------------
Token: chanel OMP: false
Oct 8, 2008 1:19:56 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description
%3Achanel
&spellcheck
=
true
&spellcheck
.onlyMorePopular
=false&spellcheck.extendedResults=true&spellcheck.count=1} hits=834
status=0 QTime=46
No Suggestions
--------------------
Token: chane OMP: false
Oct 8, 2008 1:19:56 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description
%3Achane
&spellcheck
=
true
&spellcheck
.onlyMorePopular
=false&spellcheck.extendedResults=true&spellcheck.count=1} hits=1
status=0 QTime=1
No Suggestions
--------------------
Token: chane OMP: true
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description
%3Achane
&spellcheck
=
true
&spellcheck
.onlyMorePopular
=true&spellcheck.extendedResults=true&spellcheck.count=1} hits=1
status=0 QTime=15
Sugg[0]: [chanel]
Sugg[0] Freqs: [834]
Num Found 1
--------------------
Token: chanl OMP: false
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description
%3Achanl
&spellcheck
=
true
&spellcheck
.onlyMorePopular
=false&spellcheck.extendedResults=true&spellcheck.count=1} hits=0
status=0 QTime=2
Sugg[0]: [chanel]
Sugg[0] Freqs: [834]
Num Found 1
--------------------
Token: chanl OMP: false
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description
%3Achanl
&spellcheck
=
true
&spellcheck
.onlyMorePopular
=false&spellcheck.extendedResults=true&spellcheck.count=5} hits=0
status=0 QTime=2
Sugg[0]: [chanel, chant, chang, chani, chane]
Sugg[0] Freqs: [834, 10, 8, 4, 1]
Num Found 5
--------------------
Token: chanl OMP: false
Oct 8, 2008 1:19:57 PM org.apache.solr.core.SolrCore execute
INFO: [spell] webapp=null path=/select params={q=description
%3Achanl
&spellcheck
=
true
&spellcheck
.onlyMorePopular
=false&spellcheck.extendedResults=true&spellcheck.count=10} hits=0
status=0 QTime=2
Sugg[0]: [chanel, chant, chang, chani, chana, chane, charl, chand,
chan, chair]
Sugg[0] Freqs: [834, 10, 8, 4, 1, 1, 1, 1, 106, 1950]
Num Found 10
------
1) Is this an accurate representation of what you are trying to convey?
2) In light of this shared code that I hope captures both the
document side and the query side, is the issue than highlighted by the
last result above, namely, that "chan" sorts after "chand" even though
"chan" has a higher frequency?
Thanks,
Grant
[1]
package com.grantingersoll.noodles;
import junit.framework.TestCase;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SpellCheckResponse;
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer;
import org.apache.solr.client.solrj.impl.XMLResponseParser;
import org.apache.solr.common.SolrInputDocument;
import org.apache.solr.common.params.SolrParams;
import org.apache.solr.common.params.SpellingParams;
import org.apache.solr.core.CoreContainer;
import org.apache.solr.core.CoreDescriptor;
import org.apache.solr.core.SolrCore;
import org.apache.solr.handler.component.SpellCheckComponent;
import java.util.ArrayList;
import java.util.List;
import java.util.Collection;
import java.util.HashSet;
import java.io.File;
/**
*
*
**/
public class SpellCheckingTest extends TestCase {
public void testSpelling() throws Exception {
List<Pair<String, Integer>> terms = new ArrayList<Pair<String,
Integer>>();
terms.add(new Pair<String, Integer>("chanel", 834));
terms.add(new Pair<String, Integer>("chant", 10));
terms.add(new Pair<String, Integer>("chang", 8));
terms.add(new Pair<String, Integer>("chani", 4));
terms.add(new Pair<String, Integer>("chand", 1));
terms.add(new Pair<String, Integer>("chana", 1));
terms.add(new Pair<String, Integer>("charl", 1));
terms.add(new Pair<String, Integer>("chane", 1));
terms.add(new Pair<String, Integer>("chan", 106));
terms.add(new Pair<String, Integer>("chair", 1950));
int id = 0;
CoreContainer container = new CoreContainer("src/main/resources/
solr", new File("src/main/resources/solr/solr.xml"));
//container.load();
//SolrCore core = container.create(descriptor);
final SolrServer client = new EmbeddedSolrServer(container,
"spell");
//client.setParser(new XMLResponseParser());
Collection<SolrInputDocument> docs = new
HashSet<SolrInputDocument>();
for (Pair<String, Integer> term : terms) {
final int freq = term.getSecond().intValue();
for (int i = 0; i < freq; ++i) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", String.valueOf(++id));
doc.addField("description", term.getFirst());
docs.add(doc);
}
}
client.add(docs);
client.optimize();
//buildSpellCheck(client);
spellCheck(client, "chanel", false, 1);
spellCheck(client, "chane", false, 1);
spellCheck(client, "chane", true, 1);
spellCheck(client, "chanl", false, 1);
spellCheck(client, "chanl", false, 5);
spellCheck(client, "chanl", false, 10);
}
private void spellCheck(SolrServer client, String token, boolean
omp, int numSuggs) throws SolrServerException {
System.out.println("--------------------");
System.out.println("Token: " + token + " OMP: " + omp);
SolrQuery query;
QueryResponse rsp;
SpellCheckResponse spRsp;
query = new SolrQuery("description:" + token);
query.set(SpellCheckComponent.COMPONENT_NAME, "true");
query.set(SpellingParams.SPELLCHECK_ONLY_MORE_POPULAR,
String.valueOf(omp));
query.set(SpellingParams.SPELLCHECK_EXTENDED_RESULTS, "true");
query.set(SpellingParams.SPELLCHECK_COUNT,
String.valueOf(numSuggs));
//query.setQueryType("dismax");
rsp = client.query(query);
spRsp = rsp.getSpellCheckResponse();
//System.out.println("Response: " + rsp);
List<SpellCheckResponse.Suggestion> suggestions =
spRsp.getSuggestions();
//System.out.println("Spelling: " + suggestions);
printSuggestions(suggestions);
}
private void printSuggestions(List<SpellCheckResponse.Suggestion>
suggestions) {
int i = 0;
if (suggestions.isEmpty() == false) {
for (SpellCheckResponse.Suggestion sugg : suggestions) {
System.out.println("\tSugg[" + i + "]: " +
sugg.getSuggestions());
System.out.println("\tSugg[" + i + "] Freqs: " +
sugg.getSuggestionFrequencies());
System.out.println("\tNum Found " + sugg.getNumFound());
}
} else {
System.out.println("No Suggestions");
}
}
}
class Pair<S, T> {
S first;
T second;
public Pair(S _first, T _second) {
this.first = _first;
this.second = _second;
}
public S getFirst() {
return this.first;
}
public T getSecond() {
return this.second;
}
}
On Oct 8, 2008, at 10:22 AM, Jason Rennie wrote:
Hi Grant,
Here are solr config files (attached) and java code (included below)
to recreate the test case.
Jason
List<Pair<String, Integer>> terms = new
ArrayList<Pair<String, Integer>>();
terms.add(new Pair<String, Integer>("chanel", 834));
terms.add(new Pair<String, Integer>("chant", 10));
terms.add(new Pair<String, Integer>("chang", 8));
terms.add(new Pair<String, Integer>("chani", 4));
terms.add(new Pair<String, Integer>("chand", 1));
terms.add(new Pair<String, Integer>("chana", 1));
terms.add(new Pair<String, Integer>("charl", 1));
terms.add(new Pair<String, Integer>("chane", 1));
terms.add(new Pair<String, Integer>("chan", 106));
terms.add(new Pair<String, Integer>("chair", 1950));
int id = 0;
final CommonsHttpSolrServer client = new
CommonsHttpSolrServer("http://solr:8080/solr/");
client.setParser(new XMLResponseParser());
for (Pair<String, Integer> term : terms) {
final int freq = term.getSecond().intValue();
for (int i = 0; i < freq; ++i) {
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", String.valueOf(++id));
doc.addField("description", term.getFirst());
client.add(doc);
}
}
client.optimize();
Here's a Pair class:
public class Pair<S, T> {
S first;
T second;
public Pair(S _first, T _second) {
this.first = _first;
this.second = _second;
}
public S getFirst() {
return this.first;
}
public T getSecond() {
return this.second;
}
}
<solrconfig.xml><schema.xml>
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ