Re: Highlighter

2006-08-11 Thread Ronnie Kolehmainen
There is an issue in JIRA, see http://issues.apache.org/jira/browse/LUCENE-645 So I guess you're not the only one. /Ronnie Citerar Mark Miller <[EMAIL PROTECTED]>: > Am I the only one that gets back a string missing the final character > when using the highlighter and the null fragmenter? I al

Re: 30 milllion+ docs on a single server

2006-08-11 Thread Mark Miller
Tomi NA wrote: On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: I've made a nice little archive application with lucene. I made it to handle our largest need: 2.5 million docs or so on a single server. Now the powers that be say: lets use it for a 30+ million document archive on a single serve

Re: 30 milllion+ docs on a single server

2006-08-11 Thread Tomi NA
On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: I've made a nice little archive application with lucene. I made it to handle our largest need: 2.5 million docs or so on a single server. Now the powers that be say: lets use it for a 30+ million document archive on a single server! (each doc siz

Highlighter

2006-08-11 Thread Mark Miller
Am I the only one that gets back a string missing the final character when using the highlighter and the null fragmenter? I always have to add the last character of what I have asked to be highlighted to what the highlighter returns when trying to hit highlight an entire document...anyone else

30 milllion+ docs on a single server

2006-08-11 Thread Mark Miller
I've made a nice little archive application with lucene. I made it to handle our largest need: 2.5 million docs or so on a single server. Now the powers that be say: lets use it for a 30+ million document archive on a single server! (each doc size maybe 10k max...as small as a 1 or 2k) Please t

Re: How can i Tokenize money values?

2006-08-11 Thread Erick Erickson
I'd do neither You can look at other analyzers, WhitespaceAnalyzer comes to mind, breaks on whitespace and leavs all special characters in. There are several to choose from. And, if you are indexing other fields and want them handled differently, use a PerFieldAnalyzerWrapper. Finally, you migh

Re: Poor performance "race condition" in FieldSortedHitQueue

2006-08-11 Thread Chris Hostetter
: ... right, thanks, now I see what you mean. In other words, IndexReader : provides the ability to read/iterate terms and docs, but caching the term : values per doc is for a higher layer - this way keeping IndexReader simpler : and maintainable. So I guess Oliver can continue with the change as h

Re: NPE when sorting on a field that is missing from a doc

2006-08-11 Thread Chris Hostetter
: we have recently noticed that doing a locale sensitive sort on a field that : is missing from some docs causes an NPE inside the call to Collator#compare : at FieldSortedHitQueue line 320 (Lucene 2.0 src): : >From looking at the standard String, float and int sorting and reading : LUCENE-406 I

Re: HELP: how to highlight the search key word in lucene's search results?

2006-08-11 Thread Ronnie Kolehmainen
This is in the FAQ: http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-75566820ee94a425c7e2950ac61d24e405fbd914 Citerar kevin <[EMAIL PROTECTED]>: > Hi, > how to highlight the search key word in lucene's search results? pls > give advise,thanks! > > --

HELP: how to highlight the search key word in lucene's search results?

2006-08-11 Thread kevin
Hi, how to highlight the search key word in lucene's search results? pls give advise,thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Lucene/Tomcat Memory Leak Issue

2006-08-11 Thread Ronnie Kolehmainen
How do you index your documents? Are you releasing old resources? Can you use a profiler to see referenced objects? I've experienced the same problem when indexing xml files which were parsed with xalan, and the memory leak in that case was in xalan. Switching to saxon solved the problems for us.

Re: WIll storing docs affect lucene's search performance ?

2006-08-11 Thread Grant Ingersoll
Large stored fields can affect performance when you are iterating over your hits (assuming you are not interested in the value of the stored field at that point in time) for a results display since all Fields are loaded when getting the Document. The SVN trunk has a version of lazy loadi

Lucene/Tomcat Memory Leak Issue

2006-08-11 Thread adrena . keating
Hello, can anyone help? We're experiencing the following issue an Widows Intranet website: Following a Tomcat restart, our application has Lucene creating a single new index in a RAMDirectory, followed by continuous creation of additional index entries as new content is published. During the

How can i Tokenize money values?

2006-08-11 Thread Gustavo Scrigna
Hello all!,     How can i tokenize money values?      Example: $25000, u$s45000, etc, so that i can search for "$25000" or "$250*"     I think de "StandardTokenizer" class is the responsible for tokenize the content of the field based on the grammar generated by javaCC, the question is: I hav

Re: Special characters

2006-08-11 Thread Erik Hatcher
On Aug 11, 2006, at 1:23 AM, Martin Braun wrote: Hello Adrian, I am indexing some text in a java object that is "%772B" with the standard analyser and Lucene 2. Should I be able to search for this with the same text as the query, or do I need to do any escaping of characters? Besides Luk

Re: WIll storing docs affect lucene's search performance ?

2006-08-11 Thread Øyvind Stegard
On Friday 11 August 2006 15:07, Prasenjit Mukherjee wrote: > I have a requirement ( use highlighter) to store the doc content > somewhere., and I am not allowed to use a RDBMS. I am thinking of using > Lucene's Field with (Field.Store.YES and Field.Index.NO) to store the > doc content. Will it hav

WIll storing docs affect lucene's search performance ?

2006-08-11 Thread Prasenjit Mukherjee
I have a requirement ( use highlighter) to store the doc content somewhere., and I am not allowed to use a RDBMS. I am thinking of using Lucene's Field with (Field.Store.YES and Field.Index.NO) to store the doc content. Will it have any negative affect on my search performance ? I think I hav

Re: Field compression too slow

2006-08-11 Thread Grant Ingersoll
SVN Head does. Has not been released yet. See http://issues.apache.org/jira/browse/LUCENE-545 and http://issues.apache.org/jira/browse/LUCENE-609 for some of the issues with it. On Aug 11, 2006, at 8:19 AM, Dragon Fly wrote: Mike, which version of Lucene supports lazy loading? Thanks.

Re: updating document

2006-08-11 Thread Karel Tejnora
Jason is right. I think, even Im not expert on lucene too, your newly added document cann't recreate terms for field with analyzer, because field text in empty. There is very hairy solution - hack a IndexReader, FieldInfosWriter and use addIndexes. Lucene is "only" a fulltext search library, n

Re: Field compression too slow

2006-08-11 Thread Dragon Fly
Mike, which version of Lucene supports lazy loading? Thanks. From: Michael McCandless <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: Field compression too slow Date: Fri, 11 Aug 2006 06:59:58 -0400 I can share the data.. but it would be q

Hits DASL

2006-08-11 Thread aslam bari
Dear All, How can i get the number of hits in a document from a DASL query result. I am using following Syntax. \n" + "http://jakarta.apache.org/slide/\";>" + "" + "" + "" + "" + "" + "" + "" + "" + "" + ""+scope+""+ "infinity" + "" + "" + "" + //Content Bas

Re: search document for keywords and keyphrases

2006-08-11 Thread Eugeny N Dzhurinsky
On Fri, Aug 11, 2006 at 01:22:26PM +0200, Simon Willnauer wrote: > Sure you can do this. > You index your document with the keywords assigned to the document and > search with and Boolean Query to get all document having the keyword > 1,2,...n-1,n. Just be aware that there are limitations to boolea

Re: search document for keywords and keyphrases

2006-08-11 Thread Simon Willnauer
Sure you can do this. You index your document with the keywords assigned to the document and search with and Boolean Query to get all document having the keyword 1,2,...n-1,n. Just be aware that there are limitations to boolean queries in lucene. see setMaxClauseCount(). which can be very memory c

Re: search document for keywords and keyphrases

2006-08-11 Thread Eugeny N Dzhurinsky
On Fri, Aug 11, 2006 at 08:11:31PM +1000, Jason Polites wrote: > Yes you could use lucene for this, but it may be overkill for your > requirement. If I understand you correctly, all you need to is find > documents which match "any" of the words in your list? Do you need to rank > the results? I

Re: Field compression too slow

2006-08-11 Thread Michael McCandless
I can share the data.. but it would be quicker for you to just pull out some random text from anywhere you like. OK, I hear you. I'll pull together some test data ... thanks. Also.. upon reflection I'm not certain using compression inside the index is really a valuable process without laz

Implementation of BM25 in Lucene

2006-08-11 Thread J.Zhu
Hi, I have seen previous discussions on the implementation of BM25 in Lucene, and still do not know the current progress on this. Could anybody give me some guidance on this? Such as some work has been done or where to start working on this. Thanks! Jianhan

Re: search document for keywords and keyphrases

2006-08-11 Thread Jason Polites
Yes you could use lucene for this, but it may be overkill for your requirement. If I understand you correctly, all you need to is find documents which match "any" of the words in your list? Do you need to rank the results? If not, it's probably easier just to create your own inverted index of

Re: custom sort

2006-08-11 Thread Chris Hostetter
: What I don't know is how can I make that fieldNorm returns the same value : for both documents, and at the same time this values is bigger than if the : query only found one of the words, smaller than finding three of three... ... : I subclass DefaultSimilarity and set it to IndexSearche

Re: SV: Lucene hits.length()

2006-08-11 Thread Chris Hostetter
I think we've moved well beyond the point where anyone can offer you suggestions based purely on a description of hte problem. As i mentioned in my last post, can you post some code that demonstrates the problem (ie: writes some arbitrary docs, opens a searcher, does a query that returns N resul

search document for keywords and keyphrases

2006-08-11 Thread Eugeny N Dzhurinsky
Hello! I have an assigment, which will require to search documents for keywords or keyphrases. For instance, I have a database of keywords/keyphrases, which might contain several millions items. Now I need to find if document contains any of the keywords/phrases listed in that database. I was th