Re: Location of code which determines a Hit for PhraseQuery

2005-09-08 Thread Paul Elschot
Hi Sean, On Thursday 08 September 2005 05:16, Sean O'Connor wrote: > Hi, > I am trying to work through the Hit collection process for a > PhraseQuery (using an exact phrase). For an example search, say I'm > looking for: > "lucene action" (quotes indicating exact phrase) > > in a one doc,

Re: Updating the index and searching

2005-09-08 Thread Paul . Illingworth
Hello Brian, Updating an index is very straightforward. Simply open the index writer for your existing index and add the new documents. The issue is that if you need to search on the updated index you need to open a new index reader in order to see the new documents. This is the timeconsuming

Updating a Document without re-analyzing

2005-09-08 Thread Paul Libbrecht
Hi, some times ago I posted a comment which asking this question (which is by no means new) about updating a Lucene document without re-analyzing, that is, where we expect the token-streams to be copied into the new document and where I intend to change only a few keyword values. I cannot f

Re: Updating a Document without re-analyzing

2005-09-08 Thread Paul . Illingworth
Hello Paul, I came across this yesterday. http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200504.mbox/[EMAIL PROTECTED] My understanding is that by splitting your fields into two indexes and putting your keyword fields into one and your complicated stuff into the other then you ca

cancel search

2005-09-08 Thread Kunemann Frank
Is there a good way to cancel a search? I mean e.g. after 10 seconds or if the user changed his mind and wants to start another query. Till now I didn't have a query that took longer than 10 secs, but this can happen easily when the network connection is very slow or something like that. I thou

Re: cancel search

2005-09-08 Thread Gusenbauer Stefan
Kunemann Frank wrote: >Is there a good way to cancel a search? I mean e.g. after 10 seconds or if the >user changed his mind and wants to start another query. >Till now I didn't have a query that took longer than 10 secs, but this can >happen easily when the network connection is very slow or so

Re: Updating a Document without re-analyzing

2005-09-08 Thread Paul Libbrecht
That could be, indeed, a good way for today. I'm still dreaming to find a ((DocumentOfSomeSort) document).getTokenStream(fieldName) for stored and non-stored fields! paul Le 8 sept. 05, à 11:56, [EMAIL PROTECTED] a écrit : My understanding is that by splitting your fields into two indexes

AW: cancel search

2005-09-08 Thread Kunemann Frank
The problem is that when searching there is no real save point to stop the thread. The only line that takes time is this one: Hits hits = searcher.search(query); Frank >I've had such a long lasting search too. I sounds good to start the >search in another thread. I've done this for the indexin

Re: cancel search

2005-09-08 Thread Yonik Seeley
You could create your own HitCollector that checked a flag on each hit, and throw an exception if it was set. In a separate thread, you could set the flag to cancel the search. -Yonik Now hiring -- http://tinyurl.com/7m67g On 9/8/05, Kunemann Frank <[EMAIL PROTECTED]> wrote: > > > The problem

Re: cancel search

2005-09-08 Thread Yonik Seeley
You could create your own HitCollector that checked a flag on each hit, and throw an exception if it was set. In a separate thread, you could set the flag to cancel the search. -Yonik Now hiring -- http://tinyurl.com/7m67g On 9/8/05, Kunemann Frank <[EMAIL PROTECTED]> wrote: > > > The problem

AW: cancel search

2005-09-08 Thread Kunemann Frank
That's smart, I really like this idea. :) Thank you! Frank -Ursprüngliche Nachricht- Von: Yonik Seeley [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 8. September 2005 15:09 An: java-user@lucene.apache.org Betreff: Re: cancel search You could create your own HitCollector that checked a

Re: Excel Spreadsheet

2005-09-08 Thread christopher may
Would you be able to point in the direction of how I can load the lucene code into a JDE environment. Im working with the Blackberry jde. I am new to this so any help would be appreciated. Thanks From: Erik Hatcher <[EMAIL PROTECTED]> Reply-To: java-user@lucene.apache.org To: java-user@lucene.

Matching Search Terms Per Document

2005-09-08 Thread Mark Horan
Hi, Can anybody tell me how to match up query terms with appropriate documents returned from a search? For example, in the code below I can extract document objects from the Hits. With the Document object I can find out, via the path field, which document filenames are implicated in the search res

Weird time results doing wildcard queries

2005-09-08 Thread Richard Krenek
Hello All, I am getting some weird time results when retrieving documents back from a hits object. I am just timing this bit of code: Hits hits = searcher.search(query); long startTime = System.currentTimeMillis(); for (int i = 0; i < hits.length(); i++) { Document doc = hits.doc(i); String field

Re: Weird time results doing wildcard queries

2005-09-08 Thread Jeremy Meyer
The issue isn't with multiple wildcards exactly. Specifically, the problem is if the query starts with a wildcard. In the case where it starts with a wildcard, lucene has no option but to linearly go over every term in the index to see if it matches your pattern. It must visit every singe term i

Re: Weird time results doing wildcard queries

2005-09-08 Thread Richard Krenek
I understand that for the query, but why does it matter once you have the Hits object? That is the part I'm baffled on. The query with the wildcard in the front takes a lot longer, but we expected that. On 9/8/05, Jeremy Meyer <[EMAIL PROTECTED]> wrote: > > The issue isn't with multiple wildcar

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
: is if the query starts with a wildcard. In the case where it starts with a : wildcard, lucene has no option but to linearly go over every term in the : index to see if it matches your pattern. It must visit every singe term in That would explain why the search itself takes a while, but not why a

Re: Weird time results doing wildcard queries

2005-09-08 Thread Yonik Seeley
The Hits class collects the document ids from the query in batches. If you iterate beyond what was collected, the query is re-executed to collect more ids. You can use the expert level search methods on IndexSearcher if this isn't what you want. -Yonik On 9/8/05, Richard Krenek <[EMAIL PROTEC

Re: Weird time results doing wildcard queries

2005-09-08 Thread Richard Krenek
I did the change and here are the results: Query (default field is COMP_PART_NUMBER): 2444* Query: COMP_PART_NUMBER:2444* Query Time: 328 ms - time for query to run. 383 total matching documents. Cycle Time: 141 ms - time to run through hits. Query (default field is COMP_PART_NUMBER): *91822* Qu

Re: Weird time results doing wildcard queries

2005-09-08 Thread Daniel Naber
On Friday 09 September 2005 00:40, Chris Hostetter wrote: > 1) How similar, and how many? ... If i remember correctly, the Hits > constructor does some work to pre-fetch the first 100 results. What's really expensive in fetching documents is the disk access (often one disk seek per matching docu

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
As Yonik pointed out in his reply, the batching/caching done by Hits is worse then i remembered. It's not just batching up the retrieval of stored fields -- it's re-executing the underlying search to pull back the id,score pairs for docs 0->N*2 anytime you ask for any information about result N i

Re: lia demos without ant

2005-09-08 Thread Gasi
Hi Otis, you was right. I solved the problem. Now I am able to try the examples. Greetings Gaston - Original Message - From: "Otis Gospodnetic" <[EMAIL PROTECTED]> To: Sent: Thursday, September 08, 2005 1:57 AM Subject: Re: lia demos without ant > Hi Gasi, > > Please see page xxxii

Re: Weird time results doing wildcard queries

2005-09-08 Thread Richard Krenek
This answers a lot of questions and observations. We looked in the source code of the Hits object and found the getMoreDocs(int min) method which does what you stated below. We are assuming you meant for us to use a HitCollector instead. This brings up a new question does the Searcher call the

Reducing number of poor results from large BooleanQueries

2005-09-08 Thread Chris Hostetter
One of the things I'm currently looking into is different ways to approach the more general problem of "filtering by score" in the specific case of BoolenQueries that have a large number of optional terms. Below is a description of the problem I'm considering (with two examples of how it can arri

Re: Weird time results doing wildcard queries

2005-09-08 Thread Yonik Seeley
A HitCollector returns docs by the order they are found (in the index, not by relevance). Use a search method that returns TopDocs if you want the first n documents without executing the query more than once (Hits uses this internally). -Yonik Now hiring -- http://tinyurl.com/7m67g On 9/8/05,

Re: custom sort

2005-09-08 Thread raymondcreel (sent by Nabble.com)
Hi thanks for the reply. Yes that sounds like it would work with the two searches. Perhaps a custom sort might be less overhead since it would just be one search, but I think your solution will work for my purposes. Thanks much. raymond -- Sent from the Lucene - Java Users forum at Nabble.com

Re: Weird time results doing wildcard queries

2005-09-08 Thread J.J. Larrea
I've verified that for a large pull from Hits, the logic as described makes it *significantly* faster to request the last desired hit [which could still be far fewer than hits.length()] before iterating through the hits, e.g. the hits.id line in the quoted snippet below. Here are relative timin

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
: Which makes me wonder whether the caching logic of Hits, optimized for : random- rather than linear-access, and not tuneable or controllable in : 1.4.3, should be reviewed for a subsequent release, at least the : API-breaking 2.0. I'll wager that a majority of applications do nothing : other th

Re: Weird time results doing wildcard queries

2005-09-08 Thread J.J. Larrea
At 8:01 PM -0700 9/8/05, Chris Hostetter wrote: >: Which makes me wonder whether the caching logic of Hits, optimized for >: random- rather than linear-access, and not tuneable or controllable in >: 1.4.3, should be reviewed for a subsequent release, at least the >: API-breaking 2.0. I'll wager th

Re: Weird time results doing wildcard queries

2005-09-08 Thread Chris Hostetter
: > * move the call to getMoreDocs(int) from Hits to Searcher.search : : Hmm... Hits is passed to the caller and works as a standalone cache. : While it maintains a reference to the Searcher, it only uses that to : resolve Documents upon misses. Perhaps the current separation of : concerns is ac