Yes karl, when I explore the index by Luke I can see the terms
for example I have a field namely, patientResult, it contains value Ca.
Oxalate:many and also other values such as Ca. Oxalate:few etc.
the problems are when I put this query: patientResult:(Ca. Oxalate:few)
the result is
84329 Ca.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hello,
I have an index with an 'actor' field, for each actor there exists an single
field value entry, e.g.
stored/compressed,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition
movie_actors
movie_actors:Mayrata O'Wisiedo (as
Well then that is particularly spooky!!
And, hopefully, possible/easy to reproduce. Thanks.
Mike
testn [EMAIL PROTECTED] wrote:
I use RAMDirectory and the error often shows the low number. Last time it
happened with message 7=7. Nest time it happens, I will try to capture
the stacktrace.
Hi Shailendra,
Could you pls send the same class file to my gmail a/c too ?
Regards
vini
Shailendra Sharma wrote:
Ah, Good way !
On 8/4/07, Paul Elschot [EMAIL PROTECTED] wrote:
On Friday 03 August 2007 20:35, Shailendra Sharma wrote:
Paul,
If I understand Cedric right, he wants
Actually I don't think I'm having trouble-- as I mentioned,
my text is *not* stored, so to do highlighting I retrieve the
text from the database, apply the appropriate analyzer,
and do the highlighting. It seems to be working exactly as
it should. My problem was that in a few cases, the document
A couple of questions about term frequencies and stemming:
- What's the best way to get the most common unstemmed form of a
Porter-stemmed word from the index? For example given the stem
'walk', find that 'walking' is the most common full word in the index.
- Is there a way to get a list
Donna,
Now I understand what you are saying (seems that I had PBCAK as well ;-)
As for your last question: ...under what conditions would the highlighter
return nothing? Only if no terms matched?
I remember that I found that highlighter can return null or empty string in
different situations. I
Highlighter deliberately returns null so the calling app can tell when the text
wasn't successfully highlighted.
Situations when this can happen are:
1) The text is out of synch with the index (the scenario you encountered)
2) The choice of analyzer used to tokenize the text differs from that
Here you go
- Error during the indexing : docs out of order (0 = 0 )
org.apache.lucene.index.CorruptIndexException: docs out of order (0 = 0 )
at
org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:368)
at
On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote:
A couple of questions about term frequencies and stemming:
- What's the best way to get the most common unstemmed form of a
Porter-stemmed word from the index? For example given the stem
'walk', find that 'walking' is the most common full
On 16 Aug 2007, at 17:06, Grant Ingersoll wrote:
On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote:
A couple of questions about term frequencies and stemming:
- What's the best way to get the most common unstemmed form of a
Porter-stemmed word from the index? For example given the stem
Hi,
What I meant was that highlighter can return either null or empty string. So
one should check for the null first and then also for . At least that is
my observation...
Lukas
On 8/16/07, mark harwood [EMAIL PROTECTED] wrote:
Highlighter deliberately returns null so the calling app can tell
I'm getting an ArrayIndexOutOfBoundsException in
MultiLevelSkipListReader$SkipBuffer. This happens sporadically, on a
fairly small index (18 MB, about 30,000 documents). The index is
subject to a lot of adds and deletes, some of them concurrently. It
happens after about 4 days of heavy usage. I
I wonder if this is related to
https://issues.apache.org/jira/browse/LUCENE-951
If it's easy enough for you to reproduce, could you try the trunk
version of Lucene and see if it's fixed?
-Yonik
On 8/16/07, Scott Montgomerie [EMAIL PROTECTED] wrote:
I'm getting an ArrayIndexOutOfBoundsException
On 16 Aug 2007, at 15:17, Alf Eaton wrote:
- Is there a way to get a list of all the terms in the index (or
maybe just the top n) ordered by descending frequency of usage? I
imagine it's related to docFreq, but can't see how to get a list of
terms in all documents.
Thanks to
Hello all.
I am trying to get at the raw difference that Lucene uses -- the result of
the fail-fast Levenstein distance algorithm. I believe that it is
calculated in FuzzyTermEnum.java (FuzzyTermEnum.cs).
In the application I have built upon Lucene, I would like to expose
similarity as the
Can you post your code? Make sure that when you use wildcard in your custom
query parser, it will generate either WildcardQuery or PrefixQuery
correctly.
is_maximum wrote:
Yes karl, when I explore the index by Luke I can see the terms
for example I have a field namely, patientResult, it
OK. Is it possible to capture this as small test case?
Maybe also call IndexWriter.setInfoStream(System.out) and capture details on
what segments are being merged?
Can you shed some light on how the application is using Lucene? Are you doing
deletes as well as adds? Opening readers against
Apologies if this is in the FAQ or elsewhere available but I could not
find this.
Can I provide a list of words that should *not* be stemmed by the
SnowballFilter? My analyzer looks like this:
analyzer = new StandardAnalyzer(stopwords) {
public TokenStream tokenStream(String fieldName,
Not that I know of. I suspect you'll have to write a filter that returns
the stemmed or unstemmed based on membership in your list
of words not to stem.
Best
Erick
On 8/16/07, Donna L Gresh [EMAIL PROTECTED] wrote:
Apologies if this is in the FAQ or elsewhere available but I could not
find
: After you close that IndexWriter, can you list the files in your
: directory (that's a RAMDirectory right?)? Something like this:
The OP said this was a fairly small RAMDirectory index right? would it be
worth while to just write the whole thing to disk and post it onlin so
people could see
16 aug 2007 kl. 20.34 skrev Donna L Gresh:
Apologies if this is in the FAQ or elsewhere available but I could not
find this.
Can I provide a list of words that should *not* be stemmed by the
SnowballFilter?
If it is a static list, simply add it as an exception in the snowball
code and
There are two files:
1. segments_2 [-1, -1, -3, 0, 0, 1, 20, 112, 39, 17, -80, 0, 0, 0, 0, 0, 0,
0, 0]
2. segments.gen [-1, -1, -1, -2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0,
0, 2]
but this one when the index is done done properly.
hossman wrote:
: After you close that IndexWriter, can
OK, that's clean (no leftover files). So this cause does not seem to
be the same cause as LUCENE-140.
Can you capture the exact docs you are adding (all indexed fields) and
try to replay them to see if the same exception is reproducible?
Have you seen this happen on a different machine? (Just
Hi,
While researching support for wildcards in a PhraseQuery, I see various
references to SpanRegexQuery which is not part of the 2.2 distribution. I
checked the Lucene site to see if it's some add-on jar, but couldn't find
anything so I'm wondering where can I obtain the .class/jar file(s) for
It should already be on your disk with the distribution. Try
your base lucene directory/contrib/regex.
Lots of things are rooted in contrib, and I've never had to
find any other jars from the Lucene site, they've all
been in contrib
Hope this helps
Erick
On 8/16/07, dontspamterry [EMAIL
Hi Christian,
Is there anyway you can post a complete, self-contained example
preferably as a JUnit test? I think it would be useful to know more
about how you are indexing (i.e. what Analyzer, etc.)
The offsets should be taken from whatever is set in on the Token
during Analysis. I,
I just tried it with the latest nightly build, the problem still happens.
I think it must have to do with a corrupted index somehow. I've also
noticed, as a separate issue, that after this period of time (4-5 days),
certain documents aren't indexed correctly. For example, I will do a query:
28 matches
Mail list logo