Re: Searching for instances within a document

2008-07-10 Thread Ajay Lakhani
Hi James, Try this: Searcher searcher = new IndexSearcher(dir); QueryParser parser = new QueryParser("content", new StandardAnalyzer()); Query query = parser.parse(queryString); HashSet queryTerms = new HashSet(); query.extractTerms(queryTerms); Hits hits = searcher.sear

Highlighting terms with different style

2008-07-10 Thread jim
Hi Is it possible to Hightlight more than one terms with highlighter but with different style for each term ?? 1st term with SimpleHTMLFormatter("", ""); 2rd term with SimpleHTMLFormatter("", ""); .. n-th term with SimpleHTMLFormatter("", ""); or for foloween code SimpleHTMLFormatter

Re: newbie question

2008-07-10 Thread Chris Bamford
Hi John, Just continuing from an earlier question where I asked you how to handle strings like "from:fred flintston*" (sorry I have lost the original email). You advised me to write my own BooleanQuery and add to it Prefix- / Term- / Phrase- Querys as appropriate. I have done so, but am having

Can we update a field on the current index

2008-07-10 Thread Aditi Goyal
Hi, I want to modify a field on the current index. Can it be done? For what I have heard that we cannot update the index . It has to be reindexed by deleting and then indexing again. Thanks, Aditi

Re: newbie question (for John Griffin)

2008-07-10 Thread Chris Bamford
Hi John, Further to my question below, I did some back-to-basics investigation of PhraseQueries and found that even basic ones fail for me... I found the attached code on the Internet (see http://affy.blogspot.com/2003/04/codebit-examples-for-all-of-lucenes.html) and this fails too... Can you

Re: Can we update a field on the current index

2008-07-10 Thread Michael McCandless
Yes you must delete the entire document and then re-index a new one, to update a single Field. There is some work underway, or at least a Jira issue opened, towards improving this situation, here: https://issues.apache.org/jira/browse/LUCENE-1231 But it will be some time before that'

Re: Searching for instances within a document

2008-07-10 Thread jnance
Yes, the term frequency vector is exactly what I needed. Thanks! -James Ajay Lakhani wrote: > > Hi James, > > Try this: > > Searcher searcher = new IndexSearcher(dir); > QueryParser parser = new QueryParser("content", new > StandardAnalyzer()); > Query query = parser.parse(queryS

Best practice for updating an index when reindexing is not an option

2008-07-10 Thread Christopher Kolstad
Hi. Currently using Lucene 2.3.2 in a tomcat webapp. We have an action configured that performs reindexing on our staging server. However, our live server can not reindex since it does not have the necessary dtd files to process the xml. To update the index on the live server we perform a subvers

Re: Payloads and SpanScorer

2008-07-10 Thread Grant Ingersoll
I'm not fully following what you want. Can you explain a bit more? Thanks, Grant On Jul 9, 2008, at 2:55 PM, Peter Keegan wrote: If a SpanQuery is constructed from one or more BoostingTermQuery(s), the payloads on the terms are never processed by the SpanScorer. It seems to me that you wou

Re: Payloads and SpanScorer

2008-07-10 Thread Peter Keegan
Suppose I create a SpanNearQuery phrase with the terms "long range missiles" and some slop factor. Each term is actually a BoostingTermQuery. Currently, the score computed by SpanNearQuery.SpanScorer is based on the sloppy frequency of the terms and their weights (this is fine). But even though eac

Re: Best practice for updating an index when reindexing is not an option

2008-07-10 Thread Michael McCandless
Why does SubversionUpdate require shutting down the IndexSearcher? What goes wrong? You might want to switch instead to rsync. A Lucene index is fundamentally write once, so, syncing changes over should simply be copying over new files and removing now-deleted files. You won't be able

Re: .fdt file

2008-07-10 Thread Yonik Seeley
On Thu, Jul 10, 2008 at 1:42 AM, blazingwolf7 <[EMAIL PROTECTED]> wrote: > Well, I am trying to extract the URL and contentLength from the ".fdt" file. > I am planning to use both of these values in a filter to remove certain > links to be display in the search result. The problem is, I am told not

RE: performance feedback

2008-07-10 Thread Beard, Brian
Currently the default setting is being used with our setup, so autoCommit is true. I'll set this to false to see if it improves. Question: If autoCommit is false, does this apply to optimization also, so that during an hour long optimization that gets killed in the middle, will the index be in the

Re: newbie question (for John Griffin) - fixed

2008-07-10 Thread Chris Bamford
Hi John, Please ignore my earlier questions on this subject, as I have got to the bottom of it. I was not passing each word in the phrase as a separate Term to the query; instead I was passing the whole string (doh!). Thanks. - Chris Chris Bamford wrote: Hi John, Further to my question be

Re: performance feedback

2008-07-10 Thread Yonik Seeley
On Thu, Jul 10, 2008 at 11:13 AM, Beard, Brian <[EMAIL PROTECTED]> wrote: > Question: If autoCommit is false, does this apply to optimization also, > so that during an hour long optimization that gets killed in the middle, > will the index be in the left in the initial state before optimization > s

Re: Sorting case-insensitively

2008-07-10 Thread Paul J. Lucas
On Jul 9, 2008, at 10:14 PM, Chris Hostetter wrote: I'm going to guess you have a doc where that field doesn't have a value. ordinarily that's fine, but maybe SortComparator doesn't handle that case very well. But how does the built-in STRING sort work with null values then? And how do I

Re: Payloads and SpanScorer

2008-07-10 Thread Grant Ingersoll
Makes sense. It was always my intent to implement things like PayloadNearQuery, see http://wiki.apache.org/lucene-java/Payload_Planning I think it would make sense to develop these and I would be happy to help shepherd a patch through, but am not in a position to generate said patch at thi

Re: Payloads and SpanScorer

2008-07-10 Thread Peter Keegan
I may take a crack at this. Any more thoughts you may have on the implementation are welcome, but I don't want to distract you too much. Thanks, Peter On Thu, Jul 10, 2008 at 1:30 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Makes sense. It was always my intent to implement things like > P

Re: Sorting case-insensitively

2008-07-10 Thread Chris Hostetter
: But how does the built-in STRING sort work with null values then? And how do : I make a SortComparitor that works? Built in string sorting uses FieldCache.DEFAULT.getStringIndex() ... any doc without a value ends up without an assignment in StringIndex.order[], so it gets the default value o

Re: how to get IndexReader Remote?

2008-07-10 Thread Chris Hostetter
: I have a MultiSearcher from remote using by : Naming.bind("rmi://"+IP+":"+PORT+"/"+NAME, RemoteSearchable) : ,but MultiSearcher doesn't has getIndexReader() . : How to get IndexReader? It's not possible to get a remote IndexReader ... that's the main distinction between the Searchable interf

Boolean expression for no terms OR matching a wildcard

2008-07-10 Thread Ronald Rudy
I need to perform a query for a term that may or may not have values, and I need to check for the conditions where either no terms are indexed OR any and ALL indexed terms match a wildcard. For example, say the following values were indexed as terms in the field "myfield" in the three docum

how to get total hit count for each Searchable?

2008-07-10 Thread xin liu
Hi, I have individual index files for Audio, Image and PDF files. We build common meta fields for them. When I search for a string, I want the search defaults to return mixed search results from these 3 different index based on relevancy. But I also wants to know hit count for each individual in

Re: .fdt file

2008-07-10 Thread blazingwolf7
Thanks. I think I will follow the advice. But just for the sack of curiosity, can what I suggest be done ? Yonik Seeley wrote: > > On Thu, Jul 10, 2008 at 1:42 AM, blazingwolf7 <[EMAIL PROTECTED]> > wrote: >> Well, I am trying to extract the URL and contentLength from the ".fdt" >> file. >> I a

Re: .fdt file

2008-07-10 Thread Grant Ingersoll
On Jul 10, 2008, at 1:42 AM, blazingwolf7 wrote: Well, I am trying to extract the URL and contentLength from the ".fdt" file. I am planning to use both of these values in a filter to remove certain links to be display in the search result. The problem is, I am told not to use the IndexR

Re: .fdt file

2008-07-10 Thread blazingwolf7
Well, according to him, using the reader to access the index everytime a document is found to retrieve certain values is inefficient. Meaning if there is 500k document, the index will be access 500k times. It might affect the performance of the search. So I am instructed to retrieve all the neces

RE: newbie question (for John Griffin)

2008-07-10 Thread John Griffin
Chris, The code you refer to in the blog is 5 years old! Some of the code is no longer valid with the newer Lucene jars. I wouldn't use it to test anything. My suspicion is that your index itself is suspect. Let's see the code you use to build the index with a small data set that will show what

RE: newbie question (for John Griffin) - fixed

2008-07-10 Thread John Griffin
Chris, -Original Message- From: Chris Bamford [mailto:[EMAIL PROTECTED] Sent: Thursday, July 10, 2008 9:15 AM To: java-user@lucene.apache.org Subject: Re: newbie question (for John Griffin) - fixed Hi John, Please ignore my earlier questions on this subject, as I have got to the bottom

Deletions

2008-07-10 Thread John Griffin
Guys (and Gals), A question on index deletions, what exactly happens to the Lucene document numbers in an index when a document is deleted? Let's say I have a 5 doc index. Document # Doc 0 doc1 1

Re: Can we update a field on the current index

2008-07-10 Thread Aditi Goyal
Thanks Mike for your valuable time. Regards, Aditi On Thu, Jul 10, 2008 at 5:36 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Yes you must delete the entire document and then re-index a new one, to > update a single Field. > > There is some work underway, or at least a Jira issue opened

Re: Deletions

2008-07-10 Thread Anshum
Hi John, In case of deletions, it is just a delayed delete. In other words, the doc is just marked as deleted in the deletable file, leaving a void in the numbering of docs. The actual shifting of document ids happens only when you optimize the index. In that case the deletables file is used to ph