When I use french AND antiques I get documents like this :
score: 1.0, boost: 1.0, cont: French Antiques
score: 0.23080501, boost: 1.0, cont: FRENCH SEPTIC
score: 0.23080501, boost: 1.0, cont: French French Septic
score: 0.20400475, boost: 1.0,id: 25460, cont: French Associates
As in the
Are you really, really sure that your *analyzer* isn't automatically
lower-casing your *query* and turning french AND antiques into french and
antiques, then, as Chris says, treating and as a stop word?
The fact that your parser transforms antiques into antiqu leads me to
suspect that there's a
3 docs with one field each in index:
-
french beast stone
crazy rolling stone
rolling stone done in by coconut
3 searches, default op set as AND
-
search(coconut stone);
search(coconut OR stone);
search(coconut AND stone);
I am new to Lucene so I'll admit I am confused by a few things. I'm using
an index which was built with the StandardAnalyzer. I have verified this by
using an IndexReader to read the docs back out ... Antiques is not Antiq in
the index. So according to this note in the Lucene docs I would
Well, I'm puzzled as well, in my simple examples I just ran, the AND
operator behaves just fine, but that was using StandardAnalyzer. So it's
almost certain we're not talking about the same thing G...
So, I guess I have a couple of suggestions:
1 Try your query without the stemmingAnalyzer. Try
Ok guys ... you're going to want to yield a big stick to me. The problem
was my HItCollector, I wasn't actually passing it to my searcher. Yes
somewhere in my testing I had commented out that code and it was making it
look like I wasn't getting hits.
One more question about IndexWriters (maybe
: One more question about IndexWriters (maybe I don't deserve an answer here
: :-) ) I assume that the Analyzer used is applied and written to the
: index per field. So if I wanted one for Snowball or Stemming I'd have to
: write multiple indexes? I'm a bit confused as to how the Stemmed
That question was badly worded. I was trying to ask that when I write an
index using the StandardAnalyzer, the docs are transformed using that
analyzer then written to the index post transformation. So stop words or
things like apostrophes would be removed.
Scott's Lawn and Garden Care
: index using the StandardAnalyzer, the docs are transformed using that
: analyzer then written to the index post transformation. So stop words or
: things like apostrophes would be removed.
if the analyzer used behaves that way, then yes -- the indexed terms will
remove those things.
: Scott's
Sorry for the confusion all.
The code i am talking about is, the lucene-2.0 API
Document doc = hits.doc(i);
String path = doc.get(path);
lucene-2.0.0/src/demo/org/apache/lucene/demo/SearchFiles.java (line 147)
I am not sure where they are getting the path. How are they inserting it
into the
You probably want to tak a closer look at the StandardAnalyzer. It uses
StandardTokenizer and StandardFilter. From the javadoc
StandardTokenizer
A grammar-based tokenizer constructed with JavaCC.
This should be a good tokenizer for most European-language documents:
- Splits words at
Truly I am new to Lucene. That's the missing part ... I'm looking at the
stored values and not the indexed terms.
Mark
On 9/17/06, Chris Hostetter [EMAIL PROTECTED] wrote:
1) maybe you didn't really use StandardAnalyzer when the index was built?
2) keep in mind there is a differnece between
all,
We're just wondering if anyone has seen any exceptions when using the
IndexWriter.addDocument(...) or IndexReader.deleteDocuments(Term term)
methods apart from catastrophic IOExceptions (disk full/failed etc.).
Is it possible for instance that we may be able to create a document
that
On 9/18/06, Jed Wesley-Smith [EMAIL PROTECTED] wrote:
We're just wondering if anyone has seen any exceptions when using the
IndexWriter.addDocument(...) or IndexReader.deleteDocuments(Term term)
methods apart from catastrophic IOExceptions (disk full/failed etc.).
And out-of-memory exceptions.
14 matches
Mail list logo