Re: product based term combination for BooleanQuery?

2007-07-09 Thread Paul Elschot
I don't know whether this was mentioned here before, but an easy way to get product based term combination is by using a logarithm based term score. The addition of the logarithms will result in the logarithm of the product, which is sorted in the same order as the product itself. Regards, Paul E

UnExpected result for: indexReader.termDocs()

2007-07-09 Thread Vikas
Hi Friends, Before indexing somthing I want to check whether this term is already exist in index / NOT.[i.e. primary key kind of thing]. When I am trying to get docs with indexReader.termDocs(term); it is not returning expected results. I put "TermDocs" object in while loop to print document nu

Should the IndexSearcher be closed after very search completed

2007-07-09 Thread anson
Hi, Should the IndexSearcher instance be closed after very search completed. I wrote a sample, but I have not closed the singleton instatnce of IndexSearcher unless IndexReader#isCurrent() return false. Now the sample runs well, but I saw almost other samples would close the IndexSearcher ins

Re: Should the IndexSearcher be closed after very search completed

2007-07-09 Thread anson
sorry, the subject should be "Should the IndexSearcher be closed after every search completed" ~ >Hi, > >Should the IndexSearcher instance be closed after very search completed. > >I wrote a sample, but I have not closed the singleton instatnce of >IndexSearcher unless IndexReader#isCurrent()

Calling indexWriter.close() in web app

2007-07-09 Thread vcampa
I'm developing a web app with struts that need to embed lucene functionalities. I need that my app adds documents to the index after that a document is added (documents are very few, but of large size). I read that i have to use a single instance of indexwriter to edit the index. Suppose i use a s

RE: Should the IndexSearcher be closed after very search completed

2007-07-09 Thread Ard Schrijvers
Closing the IndexSearcher is best only after a deleteDocuments with a reader or changes with a writer. For performance reasons, it is better to not close the IndexSearcher if not needed Regarsd Ard > > > sorry, the subject should be "Should the IndexSearcher be > closed after > every sear

RE: Calling indexWriter.close() in web app

2007-07-09 Thread Ard Schrijvers
Hello, > I'm developing a web app with struts that need to embed lucene > functionalities. I need that my app adds documents to the > index after that a > document is added (documents are very few, but of large > size). I read that i > have to use a single instance of indexwriter to edit the >

RE: Should the IndexSearcher be closed after very search completed

2007-07-09 Thread anson
Hi Ard, Thanks for your reply, well, I think I have done right. >Closing the IndexSearcher is best only after a deleteDocuments with a >reader or changes with a writer. > >For performance reasons, it is better to not close the IndexSearcher if not > needed > >Regarsd Ard > >> >> >> sorry, th

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
>>I need this comparison to be case-insensitive The choice of case-sensitivity (and preservation of punctuation, numbers etc etc) is controlled by your choice of analyzer that you pass to MoreLikeThis. If you want to ensure your list of stop words adheres to the same logic - use the same analyz

Question regarding Index Update

2007-07-09 Thread Sonu SR
Hi, My application using lucene index. The index documents having number of fields. We have around 5 million such documents. I have problem with regular update of some of the fields in the document. So every update I need to delete and index that updated documents. This is a huge task. I know from

RE: document delete after reader.close()

2007-07-09 Thread Boeckli, Dominique
Hi Erick, thanks for your help, this was the solution: Connectors are pooled resources and i kept always the same reader for all connections: bad idea. They never see updates in the index for that reason. Now i close the readers and everything is ok. Best Dominique -Original Message-

Re: UnExpected result for: indexReader.termDocs()

2007-07-09 Thread Erick Erickson
Well, first you never test anything that will break the loop. TermDocs.next () does not set termdocs == null, but returns false I've always found it a bit confusing that calling next() doesn't skiph the first term, but you want something like this Term term = new Term(field, valu

Re: Question regarding Index Update

2007-07-09 Thread Erick Erickson
No, Lucene doesn't support just updating a field in a document. You must delete/re-add it as you suppose. And you are correct, the UpdateDocument feature just conceals the underlying delete/add functionality, it's not update-in-place. Best Erick On 7/9/07, Sonu SR <[EMAIL PROTECTED]> wrote: H

Re: document delete after reader.close()

2007-07-09 Thread Erick Erickson
Glad that worked for you. But one caution, don't be over-anxious to close readers since opening a reader is an expensive operation and will affect performance. So you should close/re-open them only when you know you've modified your index But if you're not experiencing performance issues, you

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
My application stores term vectors with the index, and use that information to implement more-like-this rather than tokenizing the original text using an analyzer. Consequently the option of achieving the effect by specifying different analyzer is no good for my case. /Jong -Original Message-

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
>>My application stores term vectors with the index And those stored term vectors contain terms produced by your choice of analyzer, no? Or are you saying that you have deliberately chosen to index the content with a case-sensitive analyzer and that you want to supply stop words in a case-inse

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
>>Or are you saying that you have deliberately chosen to index the content with a case-sensitive analyzer and that you want to supply stop words in a case-insensitive fashion? Correct. To be precise, we index each token up to twice - original token and its all-lowercase equivalent. Due to a produ

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread mark harwood
OK. I can see the logic that says it might be useful/convenient to filter case-sensitive search terms using a case-insensitive list of stop words. What seems slightly odd is that you want exactness in the choice of case yet are using an imprecise matching technique (MoreLikeThis) - effectively s

Re: Question regarding Index Update

2007-07-09 Thread Sonu SR
Thanks Erik. Is the field update is too difficult task in lucene? I expect this feature also in lucene, in near future. On 7/9/07, Erick Erickson <[EMAIL PROTECTED]> wrote: No, Lucene doesn't support just updating a field in a document. You must delete/re-add it as you suppose. And you are cor

Re: Question regarding Index Update

2007-07-09 Thread Erick Erickson
Then I think you'll be disappointed. Search the archive for phrases like "update in place" and you'll see a discussion of why this isn't as straight-forward as you might think. Best Erick On 7/9/07, Sonu SR <[EMAIL PROTECTED]> wrote: Thanks Erik. Is the field update is too difficult task in lu

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
Our requirement is simply that - 1. Do not throw away any information at indexing time - so we preserve case information and keep all tokens. 2. Search functionality is provided at two levels - 2.1 End User search - stop word filtering is done on the search terms, the same stop word list is us

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Erick Erickson
I have to disagree here. What you get when you try to encompass more and more use cases is a bloated system that is not performant. This gets very close to "make the engine solve my problems for me out of the box". Which may be reasonable for an application that is designed as an end-to-end solu

index compatibility of lucene and clucene

2007-07-09 Thread Shaghayegh Sahebie
hi all, I wanted to use Lucene for indexing and CLucene for searching the same index. can I do that and which versions of them are compatible? thanks in advance Chagh - Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay i

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread markharw00d
>>the case matters only for those words that should be included. Jong, just want to check we're on the same page - you do know MoreLikeThis has a kind of automatic Stop-Wording built in , yes? MoreLikeThis looks at the document frequency of all terms in the "this" text you provide and only sele

RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread Jong Kim
Mark, I understand your point. However, we do not maintain a separate field for the lower-case version of the words. Instead we index them twice at the same position within the same field, which allows us to provide case-exact match for search queries containing upper case characters, but case-i

Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project

2007-07-09 Thread markharw00d
>>So I'm afraid I can't use the technique you recommend. ah right - so the TermVector you use from the index will return mixed and lower case versions of the same text. One point to note - this would mean that of the 25 or so top terms selected by MoreLikeThis for querying there is a reasonable

Cannot get Field.Text to work

2007-07-09 Thread Amit
Hi I am new to Lucene and trying out the example code. But when I try to insert values using Field.Text the compiler does not recognize the Text as a method of Field. The code looks like this contactDocument.add(Field.Text("name", contact.getName())); I wanted to know if the version 2.2 does no

Lucene RAM Directory doesn't work for Index Size > 8 GB

2007-07-09 Thread muraalee
Hi, We are facing a strange problem with RAMDirectory for indices greater than 8 GB. We have indexed around 6.5 million lucene documents and the index size is around 8 GB. Below is the contents of Index Directory. 2236964197 _1x.fdt 51811488 _1x.fdx 293 _1x.fnm 2234929832 _1x.f

Re: Search that supports all valid characters in a Unix filename

2007-07-09 Thread Steven Rowe
Hi Ed, Ed Murray wrote: > Could > someone let me know the best Analyzer to use to get an exact match on a Unix > filename when it is inserted into an untokened field. > > Filenames > obviously contain spaces and forward slashes along with other characters. I > am using > a WhitespaceAnalyzer bu

Re: Cannot get Field.Text to work

2007-07-09 Thread Otis Gospodnetic
Amit, Field.Text method is long gone from Lucene. I think that was in version 1.4.3 of Lucene, maybe 1.9*, but we are in 2.* now. The place to look at the new Field API is here: http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html Otis . . . . . . . . . . . . . . . . .

Re: Search that supports all valid characters in a Unix filename

2007-07-09 Thread Otis Gospodnetic
I am not sure if I understood you 100%, but it sounds like you might be looking for KeywordAnalyzer: http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/KeywordAnalyzer.html Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag -

who has used mutual information on millions of doc?

2007-07-09 Thread qi wu
Hi, I am trying to use mutual information to find the correlation between different terms in documents. But for millions of documents the speed is too slow to calculate the mutual information. Any body have build a high performance solutions for this ? I found the article below in the former ma

unused tmp fdt files in index

2007-07-09 Thread Harini Raghavan
Hi All, I have a large lucene index of size 60G. We have had Out Of Memory issues a few times in the past due to which the indexing had got interrupted. This has resulted in a lot of .fnm, .fdt,.tmp files which don't get removed even through optimizing the index. We have data for last 90 days in

RE: unused tmp fdt files in index

2007-07-09 Thread Liu_Andy2
You can use Luke to open your index. In the Files tab, if some files are shown as deletable, it is should be safe to delete these files. Please backup your data before testing. Andy -Original Message- From: Harini Raghavan [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 10, 2007 2:29 PM