Re: query question

2007-08-16 Thread Mohammad Norouzi
Yes karl, when I explore the index by Luke I can see the terms for example I have a field namely, patientResult, it contains value Ca. Oxalate:many and also other values such as Ca. Oxalate:few etc. the problems are when I put this query: patientResult:(Ca. Oxalate:few) the result is 84329 Ca.

getting term offset information for fields with multiple value entiries

2007-08-16 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I have an index with an 'actor' field, for each actor there exists an single field value entry, e.g. stored/compressed,indexed,tokenized,termVector,termVectorOffsets,termVectorPosition movie_actors movie_actors:Mayrata O'Wisiedo (as

Re: out of order

2007-08-16 Thread Michael McCandless
Well then that is particularly spooky!! And, hopefully, possible/easy to reproduce. Thanks. Mike testn [EMAIL PROTECTED] wrote: I use RAMDirectory and the error often shows the low number. Last time it happened with message 7=7. Nest time it happens, I will try to capture the stacktrace.

Re: Can I do boosting based on term postions?

2007-08-16 Thread vini
Hi Shailendra, Could you pls send the same class file to my gmail a/c too ? Regards vini Shailendra Sharma wrote: Ah, Good way ! On 8/4/07, Paul Elschot [EMAIL PROTECTED] wrote: On Friday 03 August 2007 20:35, Shailendra Sharma wrote: Paul, If I understand Cedric right, he wants

Re: Question about highlighting returning nothing

2007-08-16 Thread Donna L Gresh
Actually I don't think I'm having trouble-- as I mentioned, my text is *not* stored, so to do highlighting I retrieve the text from the database, apply the appropriate analyzer, and do the highlighting. It seems to be working exactly as it should. My problem was that in a few cases, the document

Stemmed terms/common terms

2007-08-16 Thread Alf Eaton
A couple of questions about term frequencies and stemming: - What's the best way to get the most common unstemmed form of a Porter-stemmed word from the index? For example given the stem 'walk', find that 'walking' is the most common full word in the index. - Is there a way to get a list

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
Donna, Now I understand what you are saying (seems that I had PBCAK as well ;-) As for your last question: ...under what conditions would the highlighter return nothing? Only if no terms matched? I remember that I found that highlighter can return null or empty string in different situations. I

Re: Question about highlighting returning nothing

2007-08-16 Thread mark harwood
Highlighter deliberately returns null so the calling app can tell when the text wasn't successfully highlighted. Situations when this can happen are: 1) The text is out of synch with the index (the scenario you encountered) 2) The choice of analyzer used to tokenize the text differs from that

Re: out of order

2007-08-16 Thread testn
Here you go - Error during the indexing : docs out of order (0 = 0 ) org.apache.lucene.index.CorruptIndexException: docs out of order (0 = 0 ) at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:368) at

Re: Stemmed terms/common terms

2007-08-16 Thread Grant Ingersoll
On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote: A couple of questions about term frequencies and stemming: - What's the best way to get the most common unstemmed form of a Porter-stemmed word from the index? For example given the stem 'walk', find that 'walking' is the most common full

Re: Stemmed terms/common terms

2007-08-16 Thread Alf Eaton
On 16 Aug 2007, at 17:06, Grant Ingersoll wrote: On Aug 16, 2007, at 10:17 AM, Alf Eaton wrote: A couple of questions about term frequencies and stemming: - What's the best way to get the most common unstemmed form of a Porter-stemmed word from the index? For example given the stem

Re: Question about highlighting returning nothing

2007-08-16 Thread Lukas Vlcek
Hi, What I meant was that highlighter can return either null or empty string. So one should check for the null first and then also for . At least that is my observation... Lukas On 8/16/07, mark harwood [EMAIL PROTECTED] wrote: Highlighter deliberately returns null so the calling app can tell

[Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Scott Montgomerie
I'm getting an ArrayIndexOutOfBoundsException in MultiLevelSkipListReader$SkipBuffer. This happens sporadically, on a fairly small index (18 MB, about 30,000 documents). The index is subject to a lot of adds and deletes, some of them concurrently. It happens after about 4 days of heavy usage. I

Re: [Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Yonik Seeley
I wonder if this is related to https://issues.apache.org/jira/browse/LUCENE-951 If it's easy enough for you to reproduce, could you try the trunk version of Lucene and see if it's fixed? -Yonik On 8/16/07, Scott Montgomerie [EMAIL PROTECTED] wrote: I'm getting an ArrayIndexOutOfBoundsException

Re: Stemmed terms/common terms

2007-08-16 Thread Alf Eaton
On 16 Aug 2007, at 15:17, Alf Eaton wrote: - Is there a way to get a list of all the terms in the index (or maybe just the top n) ordered by descending frequency of usage? I imagine it's related to docFreq, but can't see how to get a list of terms in all documents. Thanks to

Possible to expose similarity as a property in hits collection?

2007-08-16 Thread Michael Barbarelli
Hello all. I am trying to get at the raw difference that Lucene uses -- the result of the fail-fast Levenstein distance algorithm. I believe that it is calculated in FuzzyTermEnum.java (FuzzyTermEnum.cs). In the application I have built upon Lucene, I would like to expose similarity as the

Re: query question

2007-08-16 Thread testn
Can you post your code? Make sure that when you use wildcard in your custom query parser, it will generate either WildcardQuery or PrefixQuery correctly. is_maximum wrote: Yes karl, when I explore the index by Luke I can see the terms for example I have a field namely, patientResult, it

Re: out of order

2007-08-16 Thread Michael McCandless
OK. Is it possible to capture this as small test case? Maybe also call IndexWriter.setInfoStream(System.out) and capture details on what segments are being merged? Can you shed some light on how the application is using Lucene? Are you doing deletes as well as adds? Opening readers against

tell snowballfilter not to stem certain words?

2007-08-16 Thread Donna L Gresh
Apologies if this is in the FAQ or elsewhere available but I could not find this. Can I provide a list of words that should *not* be stemmed by the SnowballFilter? My analyzer looks like this: analyzer = new StandardAnalyzer(stopwords) { public TokenStream tokenStream(String fieldName,

Re: tell snowballfilter not to stem certain words?

2007-08-16 Thread Erick Erickson
Not that I know of. I suspect you'll have to write a filter that returns the stemmed or unstemmed based on membership in your list of words not to stem. Best Erick On 8/16/07, Donna L Gresh [EMAIL PROTECTED] wrote: Apologies if this is in the FAQ or elsewhere available but I could not find

Re: out of order

2007-08-16 Thread Chris Hostetter
: After you close that IndexWriter, can you list the files in your : directory (that's a RAMDirectory right?)? Something like this: The OP said this was a fairly small RAMDirectory index right? would it be worth while to just write the whole thing to disk and post it onlin so people could see

Re: tell snowballfilter not to stem certain words?

2007-08-16 Thread karl wettin
16 aug 2007 kl. 20.34 skrev Donna L Gresh: Apologies if this is in the FAQ or elsewhere available but I could not find this. Can I provide a list of words that should *not* be stemmed by the SnowballFilter? If it is a static list, simply add it as an exception in the snowball code and

Re: out of order

2007-08-16 Thread testn
There are two files: 1. segments_2 [-1, -1, -3, 0, 0, 1, 20, 112, 39, 17, -80, 0, 0, 0, 0, 0, 0, 0, 0] 2. segments.gen [-1, -1, -1, -2, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 2] but this one when the index is done done properly. hossman wrote: : After you close that IndexWriter, can

Re: out of order

2007-08-16 Thread Michael McCandless
OK, that's clean (no leftover files). So this cause does not seem to be the same cause as LUCENE-140. Can you capture the exact docs you are adding (all indexed fields) and try to replay them to see if the same exception is reproducible? Have you seen this happen on a different machine? (Just

Location of SpanRegexQuery

2007-08-16 Thread dontspamterry
Hi, While researching support for wildcards in a PhraseQuery, I see various references to SpanRegexQuery which is not part of the 2.2 distribution. I checked the Lucene site to see if it's some add-on jar, but couldn't find anything so I'm wondering where can I obtain the .class/jar file(s) for

Re: Location of SpanRegexQuery

2007-08-16 Thread Erick Erickson
It should already be on your disk with the distribution. Try your base lucene directory/contrib/regex. Lots of things are rooted in contrib, and I've never had to find any other jars from the Lucene site, they've all been in contrib Hope this helps Erick On 8/16/07, dontspamterry [EMAIL

Re: getting term offset information for fields with multiple value entiries

2007-08-16 Thread Grant Ingersoll
Hi Christian, Is there anyway you can post a complete, self-contained example preferably as a JUnit test? I think it would be useful to know more about how you are indexing (i.e. what Analyzer, etc.) The offsets should be taken from whatever is set in on the Token during Analysis. I,

Re: [Fwd: Exception in MultiLevelSkipListReader$SkipBuffer.readByte]

2007-08-16 Thread Scott Montgomerie
I just tried it with the latest nightly build, the problem still happens. I think it must have to do with a corrupted index somehow. I've also noticed, as a separate issue, that after this period of time (4-5 days), certain documents aren't indexed correctly. For example, I will do a query: