Re: Sorting by Score

2007-02-28 Thread Peter Keegan
can't you pick any arbitrary "marker" field name (that's not a real field name) and use that? Yes, I could. I guess you're saying that the field name doesn't matter, except that it's used for caching the comparator, right? ... he wants the "bucketing" to happen as part of hte scoring so that t

Re: Sorting by Score

2007-02-28 Thread Erick Erickson
Empirically, when I insert the elements in the FieldSortedHitQueue they get sorted according to the Sort object. The original query that gives me a TopDocs applied no secondary sorting, only relevancy. Since I normalized all the scores into one of only 5 discrete values, and secondary sorting was

RE: document field updates

2007-02-28 Thread Steven Parkes
Are unindexed fields stored seperately from the main inverted index? If so then, one could implement the field value change as a delete and re-add of just that value? The short answer is that won't work. Field values are stored in a different data structure than the posting

RamDirectory vs IndexWriter

2007-02-28 Thread WATHELET Thomas
I don't really understand the difference between using the ramDirectory and using IndexWriter. What's the difference between using ramDirectory instead of using IndexWriter with those properties set to: setMergeFactor(1000);setMaxMergeDocs(1);setMaxBufferedDocs(1);

Re: RamDirectory vs IndexWriter

2007-02-28 Thread Nicolas Lalevée
Le Mercredi 28 Février 2007 16:19, WATHELET Thomas a écrit : > I don't really understand the difference between using the ramDirectory > and using IndexWriter. > > What's the difference between using ramDirectory instead of using > IndexWriter with those properties set to: > setMergeFactor(1000);se

Merge Indexes - addIndexes

2007-02-28 Thread DECAFFMEYER MATHIEU
Hi, I store the Lucene Index of my web applications in a file system. Oftenly, I need to add to this index another index also stored as file system. I have three questions : * What is the best way to do this ? Open an IndexReader on this newcoming index-file system and use addIndexes(IndexR

RE: RamDirectory vs IndexWriter

2007-02-28 Thread WATHELET Thomas
Je pense mettre mal exprimée. Dans les 2 cas j'utilise la classe IndexWriter mais dans un cas je l'utilise avec un RamDirectory et dans l'autre avec FSDirecory (index=new IndexWriter(ram OR fsdir,analyser,true)) Si j'utilise la classe ramDirectory c'est pour éviter l'accès disque fréquent. Mais j

Re: RamDirectory vs IndexWriter

2007-02-28 Thread Erick Erickson
I guess it depends upon your goal. If you're asking what the difference between writing to a RAMDirectory *then* flushing to an FSDIrectory, I don't believer there's much, if any. As I remember (and my memory isn't always...er...accurate), there's been discussion on this thread by those who know t

Filtering results on a Field

2007-02-28 Thread Ismail Siddiqui
Hey guys, I want to filter a result set on a particular field..I have code like this try { PhraseQuery textQuery = new PhraseQuery(); PhraseQuery titleQuery = new PhraseQuery(); PhraseQuery catQuery = new PhraseQuery(); textQuery.setSlop( 20 );

Re: Filtering results on a Field

2007-02-28 Thread Erick Erickson
When you have a category, add the pair of clauses as a sub-Boolean query. Something like... try { PhraseQuery textQuery = new PhraseQuery(); PhraseQuery titleQuery = new PhraseQuery(); PhraseQuery catQuery = new PhraseQuery(); textQuery.setSlop( 20 )

Re: Filtering results on a Field

2007-02-28 Thread Ismail Siddiqui
thanks a lot On 2/28/07, Erick Erickson <[EMAIL PROTECTED]> wrote: When you have a category, add the pair of clauses as a sub-Boolean query. Something like... try { PhraseQuery textQuery = new PhraseQuery(); PhraseQuery titleQuery = new PhraseQuery(); Phras

Re: indexing and searching the document title question

2007-02-28 Thread Phillip Rhodes
I found the problem! I did not realize using a HitCollector would return things in an unsorted order. I was using the HitCollector to try to maximize performance by only returning the documents that I needed (which page of the results, and how many per page). -Phillip - Original Message -

Re: Soliciting Design Thoughts on Date Searching

2007-02-28 Thread Walt Stoneburner
Been searching http://www.gossamer-threads.com/lists/lucene/java-user/ as Erick suggested; man, is there a wealth of information in the Lucene archives. I have found many examples of how to convert text to dates and back, how to search Date fields for various ranges, and so forth -- but I don't t

RE: Soliciting Design Thoughts on Date Searching

2007-02-28 Thread Aigner, Thomas
Walt, I am no expert, but it sounds like you need to associate many dates to a single record. Can this be handled as you would a synonym? Basically add a token at the same offset as the row itself? i.e. you would have a record that would also have a date field that has 3 offsets that woul

Re: all records within distance -- small index

2007-02-28 Thread Phillip Rhodes
I just add a 1000 to it, but in my rounding, I always make sure that I have 4 decimal places. Here are some code snippets; //indexing the lat double lat = physicalAddress.getLatitude() + 1000.0; Double latitude = new Double(lat); document.add(new Field(Indexer.LATITUDE, latitude.toString()

Re: optimizing single document searches

2007-02-28 Thread Paul Elschot
On Wednesday 28 February 2007 01:01, Russ wrote: > I will definatelly check it out tommorow. > > I also forgot to mention that I am not interested in the hits themselves, only whether or not there was a hit. Is there something I can use that's optimized for this scenario, or should I look into

Re: Sorting by Score

2007-02-28 Thread Peter Keegan
Erich, Yes, this seems to be the simplest way to implement score 'bucketization', but wouldn't it be more efficient to do this with a custom ScoreComparator? That way, you'd do the bucketizing and sorting in one 'step' (compare()). Maybe the savings isn't measurable, though. A comparator might al

Re: optimizing single document searches

2007-02-28 Thread Ruslan Sivak
karl wettin wrote: 28 feb 2007 kl. 00.49 skrev Russ: Thanks, I will try it tommorow... Is it significantly different from using a standard index on a ramdir? A bit different. You can also try LUCENE-550. It has about the same speed as contrib/memory but can handle multiple documents and

Re: Sorting by Score

2007-02-28 Thread Erick Erickson
It may well be, but as I said this is efficient enough for my needs so I didn't pursue it. One of my pet peeves is spending time making things "more efficient" when there's no need, and my index isn't going to grow enough larger to worry about that now ... Erick On 2/28/07, Peter Keegan <[EMAIL

Re: Soliciting Design Thoughts on Date Searching

2007-02-28 Thread Peter W.
Hello, There are a few ways to solve this but no Date extraction filter I know of. Adding a hundred fields for each Lucene doc seems bloated. First, get your text out of the various source documents (.doc,.pdf,.html) using available tools out there described in the Lucene in Action book. It sou

Re: Best way to returning hits after search?

2007-02-28 Thread Doron Cohen
Antony Bowesman <[EMAIL PROTECTED]> wrote on 27/02/2007 17:37:41: > Doron Cohen wrote: > > The collect() method is going to be invoked once for each document that > > matches the query (having nonzero score). If the index is very large, that > > may turn to be a very large number of calls. Often,

ranking/scoring algorithm in details

2007-02-28 Thread Jong Kim
Hi, Does anyone know of a written document that describes in some details how Lucene's ranking/scoring algorithm works? I'm safely assuming that a single consistent algorithm is being used to compute the scores of each matching documents (with or without explicit boost factors in the query) and r

Re: Soliciting Design Thoughts on Date Searching

2007-02-28 Thread Chris Hostetter
: I have generic material that _contain_ dates: historic time lines, : certificates, news articles, forms, deeds, testimonies, and wildly : free form genealogical information. The dates have no specific : structure, obvious context, nor consistency. identifying an extracting dates from bulk text

Re: ParallelSearcher in multi-node environment

2007-02-28 Thread Chris Hostetter
: I want to execute parallel search over several machines. But : ParallelSearcher doesn't look perfect. It creates threads and spawns many : requests to the underlying Searchables (over a network) for a single search. : Is there a decent implementation of the parallel search over remote indexes :

RE: ranking/scoring algorithm in details

2007-02-28 Thread Steven Parkes
http://lucene.apache.org/java/docs/scoring.html (which you can also find by googling "lucene scoring") -Original Message- From: Jong Kim [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 28, 2007 2:21 PM To: java-user@lucene.apache.org Subject: ranking/scoring algorithm in details Hi,

RE: Soliciting Design Thoughts on Date Searching

2007-02-28 Thread Steven Parkes
Yeah, date finding is a little like entity extraction, since dates can have many formats, depending on how crazy you want to get ("a week from tomorrow" is 3/8/2007 if you know that this e-mail was written today). So much so that I went and looked up lingpipe, but they seem to not be concerned with

Re: Best way to returning hits after search?

2007-02-28 Thread Mohammad Norouzi
Hello I am implemented an IndexResultSet just like java.sql.ResultSet with all its methods. when I call searcher.search(...) I pass a the returned Hits to my IndexResultSet. in the IndexResultSet I have getString(String) getString(int) getInt() next() previous() absolute() and all methods of the j

Performance in having Multiple Index files

2007-02-28 Thread Raaj
hi all, i have requirement where in i create an index file for each xml file . i have over 100/150 xml files which are all related . if create 100/150 index files and query using these indices , will this affect the performance of the search operation . bye raaj