In memory indexes in clucene

2010-03-04 Thread suman . holani
Hi, I was looking into Lucene in-memory Indexes using RAMDirectory. It has also provided with something MMapDirectory I want the indexes to persist , so want go for FSDirectory. But to enhance the searching capability , need to put the indexes onto RAM. Now , problem is how can i synchronise

Re: Lucene Indexing out of memory

2010-03-04 Thread Ian Lea
Have you run it through a memory profiler yet? Seems the obvious next step. If that doesn't help, cut it down to the simplest possible self-contained program that demonstrates the problem and post it here. -- Ian. On Thu, Mar 4, 2010 at 6:04 AM, ajay_gupta ajay...@gmail.com wrote: Erick,

Re: Lucene Indexing out of memory

2010-03-04 Thread Michael McCandless
I agree, memory profiler or heap dump or small test case is the next step... the code looks fine. This is always a single thread adding docs? Are you really certain that the iterator only iterates over 2500 docs? What analyzer are you using? Mike On Thu, Mar 4, 2010 at 4:50 AM, Ian Lea

how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi there, Could someone help me with the usage of DuplicateFilters. Here is my problem I have created a search index on book Id , title ,and author from a database of books which fall under various categories. Some books fall under more than one category. Now, when i issue a search, I get back

SpanQueries in Luke

2010-03-04 Thread Rene Hackl-Sommer
Hi, I would like to submit SpanQueries in Luke. AFAIK this isn't doable out of the box. What would be the way to go? Replace the built-in QueryParser by e.g. the xml-query-parser from the contrib section? Thanks, Rene -

Re: In memory indexes in clucene

2010-03-04 Thread Erick Erickson
You'd probably get much more pertinent answers asking on the CLucene, see: http://sourceforge.net/apps/mediawiki/clucene/index.php?title=Support http://sourceforge.net/apps/mediawiki/clucene/index.php?title=SupportErick On Thu, Mar 4, 2010 at 3:42 AM, suman.hol...@zapak.co.in wrote: Hi, I

RE: Phrase search on NOT_ANALYZED content

2010-03-04 Thread Murdoch, Paul
I'm using NOT_ANALYZED because I have a list of text items to index where some of the items are single words and some of the items are two words or more with punctuation. My problem is that sometimes one of the words in a item with two or more words matches one of the single text items. That

Re: Phrase search on NOT_ANALYZED content

2010-03-04 Thread Erick Erickson
I'm still struggling with your overall goal here, but... It sounds like what you're looking for is an exact match in some cases but not others? In which case you could think about indexing the info: field in a second field and adding a clause against *that* field for your phrase case.

RE: Phrase search on NOT_ANALYZED content

2010-03-04 Thread Murdoch, Paul
Yep. PerFieldAnalyzerWrapper seems to have solved my problem. Thanks, Paul -Original Message- From: java-user-return-45289-paul.b.murdoch=saic@lucene.apache.org [mailto:java-user-return-45289-paul.b.murdoch=saic@lucene.apache.org ] On Behalf Of Erick Erickson Sent: Thursday,

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread Ian Lea
If the field you want to use for deduping is ISBN, create a DuplicateFilter using whatever your ISBN field name is as the field name and pass that to one of the search methods that takes a filter. If your index is large I'd be worried about performance and would look at deduping at indexing time

Re: SpanQueries in Luke

2010-03-04 Thread Andrzej Bialecki
On 2010-03-04 14:13, Rene Hackl-Sommer wrote: Hi, I would like to submit SpanQueries in Luke. AFAIK this isn't doable out of the box. What would be the way to go? Replace the built-in QueryParser by e.g. the xml-query-parser from the contrib section? The upcoming Luke 1.0.1 will support this

Re: SpanQueries in Luke

2010-03-04 Thread Rene Hackl-Sommer
Hi Andrzej, Thanks! I'll keep my eyes open for that. FWIW, implementing this by replacing the QueryParser with the CoreParser worked fine. Thanks again, Rene Am 04.03.2010 16:22, schrieb Andrzej Bialecki: On 2010-03-04 14:13, Rene Hackl-Sommer wrote: Hi, I would like to submit

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread halbtuerderschwarze
I tried MultiTermQuery in combination with setRewriteMethod: MultiTermQuery mtq = new WildcardQuery(new Term(FIELD, queryString)); mtq.setRewriteMethod(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE); Did you also use Lucene 3.0.0? -- View this message in context:

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread Digy
I used Lucene.Net 2.9.2. Didn't it work? DIGY -Original Message- From: halbtuerderschwarze [mailto:halbtuerderschwa...@web.de] Sent: Thursday, March 04, 2010 6:15 PM To: java-user@lucene.apache.org Subject: RE: FastVectorHighlighter truncated queries I tried MultiTermQuery in

Problem running demo's - java classes not found

2010-03-04 Thread Paul Rogers
Dear All Hope someone can help. I'm trying to run the demo's that came with Lucene (3.0.0). I extracted the tar.gz to a directory /home/paul/bin/lucene-3.0.0 and changed into the directory. The contents of the directory are as follows: total 2288 -rw-r--r-- 1 paul paul3759 2009-11-16

Re: Why is frequency a float number

2010-03-04 Thread Chris Hostetter
:I was wondering why TF method gets a float parameter. Isn't frequency : always considered to be integer? : :public abstract float tf(float freq) Take a look at how PhraseQuery and SPanNearQuery use tf(float). For simple terms (and TermQuery) tf is always an integer, but when dealing

Fwd: Problem running demo's - java classes not found

2010-03-04 Thread Paul Rogers
Dear All Further to my previous email I notice I made a mistake with the second example. When I entered the second command it actually read: java -cp org.apache.lucene.demo.IndexFiles docs This is what gave the strange error about the docs Class was. If I issue the correct command: java

Re: SpanQueries in Luke

2010-03-04 Thread Andrzej Bialecki
On 2010-03-04 17:56, Otis Gospodnetic wrote: Andrzej, Does that mean the regular Lucene QP will get Span query syntax support (vs. having it in that separate Surround QP)? Or maybe that already happened and I missed it? :) I wish that were the case ;) No, this simply means that you will

Re: Why is frequency a float number

2010-03-04 Thread PlusPlus
Thanks for the reply. Actually what I'm looking for is to have a kind of fuzzy memberships for the terms of a document. That is, for each term of a document, I will have a membership value for that term and each term will be in each document, at most once. For that, I will need float TF and IDF

Re: Problem running demo's - java classes not found

2010-03-04 Thread Erick Erickson
Doesn't your classpath need the full path to the jar, not just the containing directory? On Thu, Mar 4, 2010 at 1:22 PM, Paul Rogers paul.roge...@gmail.com wrote: Dear All Further to my previous email I notice I made a mistake with the second example. When I entered the second command it

Re: Problem running demo's - java classes not found

2010-03-04 Thread Paul Rogers
Erick What a star!! Hadn't thought of that. Assumed (always a mistake) that the classpath only pointed to the directory. Using the following command: java -cp /home/paul/bin/lucene-3.0.0/lucene-core-3.0.0.jar:/home/paul/bin/lucene-3.0.0/lucene-demos-3.0.0.jar org.apache.lucene.demo.IndexFiles

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread halbtuerderschwarze
Not with Lucene 3.0.1. Tomorrow I will try it with 2.9.2. Arne -- View this message in context: http://old.nabble.com/FastVectorHighlighter-truncated-queries-tp27709797p27786722.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

RE: FastVectorHighlighter truncated queries

2010-03-04 Thread Digy
I don't think that it is related with lucene version. Please inspect the C# code below. fragments1 has no highlight info, on the other hand fragments2 has one. RAMDirectory dir = new RAMDirectory(); IndexWriter wr = new IndexWriter(dir, new

Lucene Web Demo

2010-03-04 Thread DasHeap
Another newcomer to Lucene here. I've got the Lucene web demo up and running on my test server. The indexing and search functions are working perfect. The problem I'm running regards the format of urls to found objects. for instance lucene will return a hit like this:

File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
Hi Mike and others, I have a test case for you (attached) that exhibits a file descriptor leak in ParallelReader.reopen(). I listed the OS, JDK, and snapshot of Lucene that I'm using in the source code. A loop adds just over 4000 documents to an index, reopening the index after each, before

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Mark Miller
On 03/04/2010 06:52 PM, Justin wrote: Hi Mike and others, I have a test case for you (attached) that exhibits a file descriptor leak in ParallelReader.reopen(). I listed the OS, JDK, and snapshot of Lucene that I'm using in the source code. A loop adds just over 4000 documents to an index,

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
Has this changed since 2.4.1? Our application didn't explicitly close with 2.4.1 and that combination never had this problem. - Original Message From: Mark Miller markrmil...@gmail.com To: java-user@lucene.apache.org Sent: Thu, March 4, 2010 6:00:02 PM Subject: Re: File descriptor

RE: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Uwe Schindler
That was always the same with reopen(). Its documented in the javadocs, with a short example: http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/index/IndexReader.html#reopen() also in 2.4.1: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/IndexReader.html#reopen()

RE: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Uwe Schindler
See my other mail for you file descriptor leak. A short note about your search code: You should not directly instantiate a TopScoreDocCollector but instead use the Searcher method that returns TopDocs. This has the benefit, that the searcher automatically chooses the right parameter for

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
We must have been getting lucky. Thanks Mark and Uwe! - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, March 4, 2010 6:20:56 PM Subject: RE: File descriptor leak in ParallelReader.reopen() That was always the same with reopen(). Its

RE: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Uwe Schindler
Sorry, small change: You should not directly instantiate a TopScoreDocCollector but instead use the Searcher method that returns TopDocs. This has the benefit, that the searcher automatically chooses the right parameter for scoring docs out/in order. In your example, search would be a little

Re: File descriptor leak in ParallelReader.reopen()

2010-03-04 Thread Justin
Makes sense. Thanks for the tip! I haven't seen a response to my 2-pass scoring question, so maybe I've asked at least one difficult one. :-) - Original Message From: Uwe Schindler u...@thetaphi.de To: java-user@lucene.apache.org Sent: Thu, March 4, 2010 6:32:06 PM Subject: RE:

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread zhangchi
i think you should check the index first.using the lukeall to see if there is the duplicate books. On Thu, 04 Mar 2010 20:43:26 +0800, ani...@ekkitab ani...@ekkitab.com wrote: Hi there, Could someone help me with the usage of DuplicateFilters. Here is my problem I have created a

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi Ian, Thanks for your reply. We had actually done what you had suggested first, and it wasn't working, so I was hoping for some sample code. But then we found out that the field name on which we wanted the duplicate filter to be applied was not actually indexed while adding it into the

Re: how to use DuplicateFilter to get unique documents based on a fieldName

2010-03-04 Thread ani...@ekkitab
Hi Zhangchi Thanks for your reply. We have about 3 million records (different isbns) in the database and documents little more than that, and we wouldn't want to do the deduping at indexing time, because one book ( one isbn ) can be available under 2 or more categories( like fiction, comics