indexing xml messages

2009-11-02 Thread vsevel
Hi, the following junit test fails on 3 out of the 6 searches: @Test public void indexXML() throws Exception { Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); RAMDirectory dir = new RAMDirectory(); IndexWriter writer = new IndexWriter(dir, analyz

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread Mark Miller
Looks like its because the query coming in is a ComplexPhraseQuery and the Highlighter doesn't current know how to handle that type. It would need to be rewritten first barring the special handling it needs - but unfortunately, that will break multi-term query highlighting unless you use boolean r

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread AHMET ARSLAN
I think there is a problem about attachment. I am re-sending it. > > Thank you for your interest, Mark. > > I am sending a java code (using lucene 2.9.0) that simply > demonstrates the problem. When the same query string is > parsed by Lucene's default QueryParser highlighting comes. > > I am

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread AHMET ARSLAN
> Yes - please share your test programs > and I can investigate (ApacheCon > this week, so I'm not sure when). Thank you for your interest, Mark. I am sending a java code (using lucene 2.9.0) that simply demonstrates the problem. When the same query string is parsed by Lucene's default QueryPars

How do you map a query for fieldx to fieldy

2009-11-02 Thread Paul Taylor
For backwards compatabiity I have to change queries for the track field to the recording field. I did this by overriding QueryParser.newQuery() as follows protected Query newTermQuery(Term term) { if ( term.field() == "track" ) { return super.newTermQuery(new Term("recording"

Re: Different score for the same documents

2009-11-02 Thread Erick Erickson
That's exactly the question. If all 16 documents have exactly the same score, then the internal tie-breaking is your answer. They would also all have strictly increasing doc IDs. But I'd check to see the scores before accepting this explanation because I find it unlikely that all 16 docs have iden

Re: Different score for the same documents

2009-11-02 Thread kenji tsuruoka
Thank you Erick. What you mentioned is right. The two same documents were shown at the 3rd and 18th. So do you mean documents between the 3rd and the 18th (at least) in the Lucene results have the same score? Cheers, K On Nov 2, 2009, at 9:59 PM, Erick Erickson wrote: What were their scor

Re: ComplexPhraseQueryParser highlight problem

2009-11-02 Thread Mark Miller
Yes - please share your test programs and I can investigate (ApacheCon this week, so I'm not sure when). And its best to keep communications on the list - that allows others with similar issues (now or in the future) to benefit from whatever goes on. You will also reach a wider pool of people that

Re: Different score for the same documents

2009-11-02 Thread Erick Erickson
What were their scores? I'm assuming that by "rank" you mean the order in which the documents were returned, not the raw Lucene score. Lucene uses the insertion order to break ties. That is, two documents with the same score will the appear in the order of their (internal) Lucene doc ID. So is it

Re: Get match exact location

2009-11-02 Thread Erick Erickson
Well, you have to do some extra work for page matching. If you search the user list you'll find significant discussions of paging. The short form is that you have to either index the entire document and record the start and/or end offset for each page (I'd put the results in a separate field in the

Different score for the same documents

2009-11-02 Thread kenji tsuruoka
Dear. Lucene users. Hi. I have tried to index and search MEDLINE abstracts by LUCENE. And there were some problems in the search results. That is Lucene has assigned different ranks for the exactly same documents. I didn't know the input documents for the index contain duplicate documents

Re: Index files not deleted after optimization

2009-11-02 Thread Michael McCandless
Something must still have these file handles open at the time the optimization completed. EG do you have a reader open on this index? Mike On Mon, Nov 2, 2009 at 6:54 AM, Ganesh wrote: > Hello all, > > I am using Lucene 2.4.1 and My app is running inside Tomcat. > > In Windows, after database o

lucene-contrib maven artifact

2009-11-02 Thread AHMET ARSLAN
Hello everyone, When I add this dependency to my pom.xml, an error occurs. org.apache.lucene:lucene-contrib:jar:2.9.0 is missing. org.apache.lucene lucene-contrib 2.9.0 jar compile I am trying to use org.apache.lucene.sear

Index files not deleted after optimization

2009-11-02 Thread Ganesh
Hello all, I am using Lucene 2.4.1 and My app is running inside Tomcat. In Windows, after database optimization, the old db files are not getting deleted. I enabled the info stream and found the below entries. I used ProcessExplorer from SysInternals to view the lock file, but old db files are

Re: LockObtainFailedException

2009-11-02 Thread Michael McCandless
It's best to arrange w/ Tomcat's shutdown to close any open writers. But, you can also use IndexWriter.unlock(Directory) to forcefully remove the lock. But be very careful: if you accidentally remove a lock out from under a live IndexWriter, that will quickly lead to index corruption. Mike On M

Get match exact location

2009-11-02 Thread Vicente David Guardiola Buitrago
Hello, everyone, I´m using lucene to index some PDF documents an it’s working great. But I’m wondering if it’s possible to know some extra information of returned matches by lucene. I need to know the exact page where lucene has found every match, and it will be get if I could get also the

Re: LockObtainFailedException

2009-11-02 Thread Anshum
Is that a part of some regular process? as in the tomcat shutdown? if it is, could you pass a shutdown signal to the search daemon/service and then get it to close the already opened writers. Also if its the service that causes the exception, add a writer.close statement to the finally block (add t

RE: LockObtainFailedException

2009-11-02 Thread Chris Bamford
Hi Anshum, Yes there is a reply, but it is Solr specific :-) I understand that I can catch the exception, but then what can I do about it when it occurs? In my case I am pretty sure that the 'write.lock' file is stale - most probably left from the last time Tomcat shut down - so I want to forc

Re: LockObtainFailedException

2009-11-02 Thread Anshum
Hi Chris, Isn't there a reply @ the older thread? In case there isn't, this is generally observed when an indexwriter is not closed properly i.e. just not closed. The lock is created on opening the indexwriter to maintain the sanity of the index. This lock gets removed on closing writer.close(). I

LockObtainFailedException

2009-11-02 Thread Chris Bamford
Hi, I was researching LockObtainFailedExceptions and came across this thread. I don't use Solr, just regular Lucene deployed via Tomcat - but I have started getting these exceptions which coincides with our recent upgrade from 2.0.0 to 2.4.0. I have found that just removing the lock file seems to

Re: scoring adjacent terms without proximity search

2009-11-02 Thread Joel Halbert
I opted to use the following query to solve this problem, since it meets my requirements, for the time being. +(cheese sandwich) "cheese sandwich"~slop This includes documents with one of more of the terms, but prefers those with an edit distance <= the slop. -Original Message- From: Jo