Re: search result problem

2007-05-22 Thread Stefan Colella
Hello, I used the setMaxFieldLength() and it works now thx all. Doron Cohen wrote: Stefan Colella wrote: I tried to only add the content of the page where that expression can be found (instead of the whole document) and then the search works. Do i have to split my pdf text into more fiel

Re: regaridng Reader.terms()

2007-05-22 Thread Mohammad Norouzi
Hi Walter, let me explain my problem in detail I have a web page let user to create his own query simple for example a user want to locate a service with specific value. so he/she doesnt know exactly the name of the service so I have to provide a list of services available (say in a combo box) and

Re: regaridng Reader.terms()

2007-05-22 Thread Mohammad Norouzi
Hi Steve, No I didn't make any change on WhiteSpaceAnalyzer I just extends my classes from the original classes and then override my new changes. so I dont think I should to contribute my classes. and my language is Persian, and only change I've made is not to ignoring unicode characters in Persi

Re: How to Update the Index once it is created

2007-05-22 Thread Emmanuel Bernard
The indexation part of Hibernate Search relies on Java Persistence API to triggers the index update transparently. Otherwise you can trigger it manually to follow the crawling approach (not transparent). Event driven vs crawling driven index update have both use cases, I would not say that on

Re: In memory MultiSearcher

2007-05-22 Thread Erick Erickson
You're right, I am suggesting that you use the Lucene caching and see if it is adequate. Mind you, I have no clue whether your application will be well served by this or not, I've just seen too many examples of folks (includeing me) jumping into a solution to a problem that doesn't exist to be ab

Re: In memory MultiSearcher

2007-05-22 Thread Peter W.
Hoss, My Lucene scaling strategy involves creating numerous indexes, so I was looking for a way to read them in together for quickness. For those interested, your suggestion of using a single IndexSearcher on a MultiReader works well by itself. Or, you can still place in memory like this: Inde

Re: In memory MultiSearcher

2007-05-22 Thread Peter W .
Erick, Thanks for the reply, this is a web application. If you want to serve image files in a scalable fashion on the Internet you make Apache serve them from memory, not the filesystem. For databases, some sites use a distributed object memory caching system such as memcached. I was hoping th

Re: MoreLikeThis?

2007-05-22 Thread Otis Gospodnetic
Donna, this is what you need to do to get the jar, and after that you just use MLT according to its API. $ cd lucene-trunk otis:~/dev/workspace/lucene-trunk otis$ cd contrib/queries/ otis:~/dev/workspace/lucene-trunk/contrib/queries otis$ ff MoreLikeThis.java ./src/java/org/apache/lucene/search/s

MoreLikeThis?

2007-05-22 Thread Donna L Gresh
Hello, I'm sorry if this is a naive question, but I have implemented my own "MoreLikeThis" functionality, and in re-reading the FAQ saw that it looks like something like this is already built, so I wanted to try it out and see if it would simplify my code: How do I find similar documents? See

Re: Optional terms in BooleanQuery

2007-05-22 Thread Paul Elschot
This is actually more for java-dev, but anyway. On Tuesday 22 May 2007 11:04, Mark Miller wrote: > Sorry, didn't mean to imply that that whole spiel was a technical > explanation...just a "how I like to think of it" to get my head around > the BooleanQuery system. If your reading that, think hig

Re: regaridng Reader.terms()

2007-05-22 Thread Steven Rowe
Hi Mohammad, May I ask what your language is? And what kind of changes to WhitespaceAnalyzer were required to make it work with your language? If you have made modifications to WhitespaceAnalyzer that are generally useful, please consider contributing your changes back to the Lucene project. Th

Re: regaridng Reader.terms()

2007-05-22 Thread Grant Ingersoll
You have to turn on term vectors when indexing. Take a look at the Field constructor that passes in TermVector. -Grant On May 22, 2007, at 8:09 AM, Mohammad Norouzi wrote: I would use a term vector to get this. See IndexReader.getTermFreqVector. You can get the term vector for just field

Re: regaridng Reader.terms()

2007-05-22 Thread Mohammad Norouzi
I would use a term vector to get this. See IndexReader.getTermFreqVector. You can get the term vector for just field 3. Grant, thanks, in my case, getTermFreqVector returns null, I dont know why it accepts a docnumber as parameter, what is it? is that the same doc id? if yes it restrict the r

Re: regaridng Reader.terms()

2007-05-22 Thread Grant Ingersoll
I would use a term vector to get this. See IndexReader.getTermFreqVector. You can get the term vector for just field 3. -Grant On May 22, 2007, at 5:29 AM, Mohammad Norouzi wrote: Hi all consider following index field1 field2 field3 text1

Re: regaridng Reader.terms()

2007-05-22 Thread Walter Ferrara
Let's suppose you modify your WhitespaceAnalyzer not to use a WhitespaceTokenizer, but a modified version of the Tokenizer which token-ize not by space but by something else, like '/'. (this is just an example of course). So suppose your real txt document contain : /text2 text3/text4 text5/text6 Wh

Re: regaridng Reader.terms()

2007-05-22 Thread Mohammad Norouzi
Walter, Yes I am using a customized WhiteSpaceAnalyzer while indexing. I said customized because I realized that standard WhiteSpaceAnalyzer dont accept unicode terms in my language so I make some change to support that. but for reading no Analyzer is used if I want to get that result, which ana

Re: regaridng Reader.terms()

2007-05-22 Thread Walter Ferrara
If Reader.terms() gives you: text3 text4 while you expect text3 text4 you should change, I presume, the Analyzer, maybe writing your own one. Mohammad Norouzi wrote: > Hi all > > consider following index > > field1 field2 field3 > text1 text1 text2

Queries on small subset in a very large index

2007-05-22 Thread Walter Ferrara
Hi, I need to execute a query on a subset of documents (I know their ids) and it has to be very fast. I've made a Filter that set the bitset only for needed docids. The point is, the subset is very small versus a index which is very big (subset size is always below the 0.05% of the total numbers of

regaridng Reader.terms()

2007-05-22 Thread Mohammad Norouzi
Hi all consider following index field1 field2 field3 text1 text1 text2 text3 text4 text4 text2 text2 text3 text5 I want to get all terms in filed3 if I use Reader.terms() it will returns

Re: Optional terms in BooleanQuery

2007-05-22 Thread Mark Miller
Sorry, didn't mean to imply that that whole spiel was a technical explanation...just a "how I like to think of it" to get my head around the BooleanQuery system. If your reading that, think high level overview more than technically accurate. I'll be more specific in the future -- as always, the

Re: In memory MultiSearcher

2007-05-22 Thread Chris Hostetter
: I'd *strongly* recommend, if you haven't, just using the regular : FSDirectories rather than RAMDirectories and only getting : complex if that's too slow... ...and if you are "Multi Searching" over a bunch of local directories anyway, then use a single INdexSearcher on a MultiReader instead ...

Re: Optional terms in BooleanQuery

2007-05-22 Thread Chris Hostetter
: BooleanQuery.Occur.SHOULD for C, D and E. However the javadocs for : BooleanClause.Occur.SHOULD states: : : "Use this operator for clauses that /should/ appear in the matching : documents. For a BooleanQuery with two |SHOULD| subqueries, at least one : of the clauses must appear in the matching

Re: Optional terms in BooleanQuery

2007-05-22 Thread Chris Hostetter
: Each doc is going to get a score -- if the score is positive the doc : will be a hit, if the score is 0 the doc will not be a hit. that's actually a fairly missleading statement ... the guts of Lucene doesn't prevent documents from "matching" with a negative score (specificly: a HitCollector ca