Search Query AND OR for Title and Description Fields

2007-05-25 Thread Ram Peters
I have title field and description field indexed. Now I want to search for "object oriented programming" either in title "and or" description using Lucene search query. How do I do this? - To unsubscribe, e-mail: [EMAIL PROTECT

Re: Indexing help needed

2007-05-25 Thread Andrzej Bialecki
jim shirreffs wrote: Thanks for the advice, I just don't see where in the Lucene code I should plug OOParcer into Lucene. I've walked the code in LIUS and Nutch (moving on to Solr) trying to find common objects. If I can find common objects in Lucene and Nutch I'll know where to plug in. Yo

Re: multiple tokens at the same position

2007-05-25 Thread Mark Miller
Another (obvious) option is to use two indexes and direct the query to the appropriate index depending on the search specification. Of course you double your space requirements, but your basically going to do that anyway if you use two fields. I chose this for the slight benefit of fewer fields on

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Grant Ingersoll
I know you have a solution already that I agree with, but I do think the DisjunctionMaxQuery could serve as the start for writing your own Query that did what you want. Why would you want to? Well, maybe you have other ways you want to search as well and don't want to mess with custom Sim

Re: Indexing help needed

2007-05-25 Thread jim shirreffs
Thanks for the advice, I just don't see where in the Lucene code I should plug OOParcer into Lucene. I've walked the code in LIUS and Nutch (moving on to Solr) trying to find common objects. If I can find common objects in Lucene and Nutch I'll know where to plug in. Lucene Objects looks li

Re: multiple tokens at the same position

2007-05-25 Thread Enis Soztutar
On 5/25/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Yes, indeed we could but it brings other problems, for example increasing : the index size, and extending the query to search for multiple fields, etc. 1) if you index both teh raw and stemmed forms your index is going to grow to roughly

Re: multiple tokens at the same position

2007-05-25 Thread Chris Hostetter
: Yes, indeed we could but it brings other problems, for example increasing : the index size, and extending the query to search for multiple fields, etc. 1) if you index both teh raw and stemmed forms your index is going to grow to roughly the same size regardless of wether the stem and the arw a

Re: Indexing help needed

2007-05-25 Thread Andrzej Bialecki
jim shirreffs wrote: Thanks to all that try to help me out Jim S P.S. If I get it working I will be happy to email post the code. If you looked at the code in Nutch, you can take most of the parse-oo plugin verbatim, because all this plugin does is it extracts the text content and metadata

Indexing help needed

2007-05-25 Thread jim shirreffs
I've been working on this for a while, I am trying to get the demo code that comes with Lucene to index OpenOffice documentss. I've looked at LIUS code and at Nutch code. But can't find an easy way. So I am digging into the code. I wrote a KcmiDocument class that returns a Document. In it I

Re: multiple tokens at the same position

2007-05-25 Thread Erick Erickson
I can only speak to the " avoid matching stemmed or canonical forms" part... Yes, but you've got to do some fancy dancing when you index, something like adding a special signifier to, say, the original token. I'll ignore the canonical part of your question for the sake of brevity. Consider inde

Re: Setting the maximum number of documents in a lucene segment

2007-05-25 Thread Otis Gospodnetic
Hello Ard, What you are after is a higher mergeFactor and probably also a higher maxBufferedDocs. Is indexing performance the concern? Don't go crazy with setting a super high (e.g. 100+) mergeFactor, unless you really have the number of open files on your server(s) set to a solid/high number.

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Yonik Seeley
On 5/25/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote: In reading the math for scoring at the bottom of: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html It appears that if I can make tf() and idf(), term frequency and inverse docume

Re: multiple tokens at the same position

2007-05-25 Thread Enis Soztutar
Yes, indeed we could but it brings other problems, for example increasing the index size, and extending the query to search for multiple fields, etc. On 5/25/07, Steven Rowe <[EMAIL PROTECTED]> wrote: Hi Enis, Enis Soztutar wrote: > In nutch we have a use case in which we need to store tokens

Re: multiple tokens at the same position

2007-05-25 Thread Steven Rowe
Hi Enis, Enis Soztutar wrote: > In nutch we have a use case in which we need to store tokens with their > original text plus their stemmed form plus their canonical form(through > some asciifization). From my understanding of lucene, it makes sense to > write a tokenstream which generates several

multiple tokens at the same position

2007-05-25 Thread Enis Soztutar
Hi, In nutch we have a use case in which we need to store tokens with their original text plus their stemmed form plus their canonical form(through some asciifization). From my understanding of lucene, it makes sense to write a tokenstream which generates several tokens for each "word", but p

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Walt Stoneburner
In reading the math for scoring at the bottom of: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Similarity.html It appears that if I can make tf() and idf(), term frequency and inverse document frequency respectively, both return 1, then coord(), w

Re: Scoring on Number of Unique Terms Hit, Not Term Frequency Counts

2007-05-25 Thread Walt Stoneburner
Grant writes: Have a look at the DisjunctionMaxQuery, I think it might help, although I am not sure it will fully cover your case. The definition for DisjunctionMaxQuery is provided at this URL: http://incubator.apache.org/lucene.net/docs/2.1/Lucene.Net.Search.DisjunctionMaxQuery.html, Grossly

RE: Setting the maximum number of documents in a lucene segment

2007-05-25 Thread Ard Schrijvers
> > Hello, > > I am trying to change the maximum number of documents in a > lucene segment. By default it seems to be 10. Correction: 10 for the smallest (just created) segments of course, because obviously merged segments are likely to contain many more documents > When I have a > mergeFac

Setting the maximum number of documents in a lucene segment

2007-05-25 Thread Ard Schrijvers
Hello, I am trying to change the maximum number of documents in a lucene segment. By default it seems to be 10. When I have a mergeFactor of say 10, then on average, after every 100 added documents lucene is merging segments. I want each segment to contain more then the default 10 documents, be

Re: Writing a document using two different Analyzers

2007-05-25 Thread Paulo Silveira
On 5/25/07, karl wettin <[EMAIL PROTECTED]> wrote: PerFieldAnalyzerWrapper that was fast! thanks! http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/ org/apache/lucene/analysis/PerFieldAnalyzerWrapper.html -- karl ---

Re: number of times the keyword match

2007-05-25 Thread Anny Bridge
Hi Grant, Is there any code example for this case? Thanks, Anny On 5/15/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: Yes, have a look at the SpanQuery functionality. -Grant On May 15, 2007, at 3:05 AM, Anny Bridge wrote: > Hi all, > > When do search with lucene,can i get the number of ti

Re: Writing a document using two different Analyzers

2007-05-25 Thread karl wettin
25 maj 2007 kl. 09.32 skrev Paulo Silveira: I have a Document with tow fields: one I would like to write with SimpleAnalyzer, the other I want to use StandardAnalyzer, is there a simple way to do it? PerFieldAnalyzerWrapper http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javad

Writing a document using two different Analyzers

2007-05-25 Thread Paulo Silveira
Hello! I have a Document with tow fields: one I would like to write with SimpleAnalyzer, the other I want to use StandardAnalyzer, is there a simple way to do it? thanks -- Paulo E. A. Silveira Caelum Ensino e Soluções em Java http://www.caelum.com.br/ -