SV: Find "latest" document (before a certain date)

2007-08-29 Thread Per Lindberg
> Från: Karl Wettin [mailto:[EMAIL PROTECTED] > 28 aug 2007 kl. 17.48 skrev Per Lindberg: > > > Now, I want to search the content, and return only the > > LATEST found document with each id. To complicate > > things a bit, I want the latest before a given date. In other > > words, for each id p

Re: SV: Find "latest" document (before a certain date)

2007-08-29 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Re: Re: SV: Find "latest" document (before a certain date)

2007-08-29 Thread tom
Tom Roberts is out of the office until 3rd September 2007 and will get back to you on his return. http://www.luxonline.org.uk http://www.lux.org.uk - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail:

Indexer / Searcher holding deleted files

2007-08-29 Thread Aleksander M. Stensby
Hello everyone. I have a system where an indexing-process is running several times a day, adding documents, and performing an optimize() at the end of every run. In addition, we have a web-application (running in tomcat) that is used to perform searches on the index(es). The problem (probab

Re: Indexer / Searcher holding deleted files

2007-08-29 Thread Mark Miller
Reopen the Searchers/Readers that are holding the files open. Aleksander M. Stensby wrote: Hello everyone. I have a system where an indexing-process is running several times a day, adding documents, and performing an optimize() at the end of every run. In addition, we have a web-application (ru

Re: Find "latest" document (before a certain date)

2007-08-29 Thread Karl Wettin
29 aug 2007 kl. 12.29 skrev Per Lindberg: how about using a RangeQuery and pick the hit with the greatest document number? Yep, that did the trick! There seems to be no Filter that can do the final picking of the highest date, so I had to do that after the search. I use IndexSearcher.search

Re: Highlighter that works with phrase and span queries

2007-08-29 Thread Mark Miller
It kind of is a contrib -- its really just a new Scorer class (with some axillary helper classes) for the old contrib Highlighter. Since the contrib Highlighter is pretty hardened at this point, I figured that was the best way to go. Or do you mean something different? - Mark Mike Klaas wrote

Re: Highlighter that works with phrase and span queries

2007-08-29 Thread Mark Miller
The patch you refer to should include the javadoc/source code. If that is not sufficient, drop me a line privately and I will email you all of the source code / javadoc. - Mark Michael Stoppelman wrote: Ah, much clearer now. It seems that the jar file is just the class files. Is the source/ja

Caching IndexSearcher in a webapp [was: Find "latest" document (before a certain date)]

2007-08-29 Thread Per Lindberg
> Från: Karl Wettin [mailto:[EMAIL PROTECTED] > 29 aug 2007 kl. 12.29 skrev Per Lindberg: > > >> how about using a RangeQuery and pick the hit with the > >> greatest document number? > > > > Yep, that did the trick! There seems to be no Filter that can do > > the final picking of the highest da

Re: Caching IndexSearcher in a webapp [was: Find "latest" document (before a certain date)]

2007-08-29 Thread Karl Wettin
29 aug 2007 kl. 14.32 skrev Per Lindberg: For each search request (it's a webapp) I currently create a new IndexSearcher, new Filter and new Sort, call searcher.search(query, filter, sorter) and later searcher.close(). You really want to reuse the IndexSearcher until new data has been added t

Re: Caching IndexSearcher in a webapp [was: Find "latest" document (before a certain date)]

2007-08-29 Thread Patrick Turcotte
Hi, Answers in the text. > For each search request (it's a webapp) I currently create > a new IndexSearcher, new Filter and new Sort, call > searcher.search(query, filter, sorter) and later searcher.close(). > > The literature says that it is desirable to cache the IndexSearcher, > but there's no

Re: Indexer / Searcher holding deleted files

2007-08-29 Thread Aleksander M. Stensby
Hmm, yeah, well thats what I do now... Shouldn't it be sufficient to do: searcher.close(); (...) searcher = new IndexSearcher(indexPath); Or? And maybe wrap that in if(searcher.getIndexReader.hasDeletions()) and possibly (!searcher.getIndexReader.isCurrent()) thanks, Aleksander On Wed, 29 A

Re: Indexer / Searcher holding deleted files

2007-08-29 Thread Erick Erickson
I'd guess that you're not closing *all* of your searchers. Which is reinforced somewhat by the fact that bouncing your Tomcat instance cleans things up. Do you perhaps open a reader in the initialization code that never gets closed? Erick On 8/29/07, Aleksander M. Stensby <[EMAIL PROTECTED]> wrot

Custom normalization in Similarity

2007-08-29 Thread Emmanuel Franquemagne
Hello, I'd like to know if there is a way to perform custom correction to the similarity norm before it is written At best, we wished we could do this by extending the Similarity class, but encodeNorm, that would be the best place to do it, is a static method and thus it's no use to override i

Re: Custom normalization in Similarity

2007-08-29 Thread Mark Miller
I think that encodeNorm and decodeNorm on Similarity are really just utility methods for norm encode/decode. It would be nice to be able to override those if you wanted to change the encode/decode method, but you should be able to modify the norm elsewhere. Actual access to the norm information

Postal Code Radius Search

2007-08-29 Thread Mike
I've searched the mailing list archives, the web, read the FAQ, etc and I don't see anything relevant so here it goes… I'm trying to implement a radius based searching based on zip/postal codes. (The user enters their zip code and I show nearby matches under x miles away sorted by linear distance

Re: Postal Code Radius Search

2007-08-29 Thread Will Johnson
a CustomScoreQuery combined with a FieldCacheSource that holds the the lat/lon might work. - will On Aug 29, 2007, at 11:15 AM, Mike wrote: I've searched the mailing list archives, the web, read the FAQ, etc and I don't see anything relevant so here it goes… I'm trying to implement a rad

SV: Caching IndexSearcher in a webapp [was: Find "latest" document (before a certain date)]

2007-08-29 Thread Per Lindberg
Kalle and Patrick: many thanks for the suggestions! Caching the IndexSearcher in the ServletContext sounds like a very good idea. However, I have to index a number of databases, each with a different Lucene index. So keeping an IndexSearcher for each may come with a prohibitive memory cost. But as

Re: indexing fields with multiplicity

2007-08-29 Thread Karl Wettin
28 aug 2007 kl. 21.41 skrev Tim Sturge: Hi, I have fields which have high multiplicity; for example I have a topic with 1000 names, 500 of which are "USA" and 200 are "United States of America". Previously I was indexing "USA USA .(500x).. USA United States of America .(200x).. United

Re: Postal Code Radius Search

2007-08-29 Thread Steven Rowe
Mike wrote: > I've searched the mailing list archives, the web, read the FAQ, etc and I > don't see anything relevant so here it goes… > > I'm trying to implement a radius based searching based on zip/postal codes. Here is a selection of interesting threads from the Lucene ML with relevant info:

Re: indexing fields with multiplicity

2007-08-29 Thread Tim Sturge
I'm looking for a boost when the anchor text is more commonly associated with one topic than another. For example the United States of America is called "USA" by a lot of people. The United Space Alliance is also called "USA" but by many less people. If I just index them both with "USA" once, t

RE: Postal Code Radius Search

2007-08-29 Thread Charles Patridge
Here is an example of getting all the zipcodes within a certain radius - Something I did in SAS but I am sure you can convert the formula into another language. http://www.sconsig.com/sastips/tip00156.htm Chuck Patridge Charles Patridge Full Capture Solutions, Inc. 333 Roberts Street, Suite 40

RE: Postal Code Radius Search

2007-08-29 Thread Charles Patridge
Will, http://www.sconsig.com/sastips/tip00156.htm This is an example I used written in SAS code which should be able to convert to another language - to find all zipcodes within a certain radius. HTH, Chuck P. Charles Patridge Full Capture Solutions, Inc. 333 Roberts Street, Suite 400 East Hart

Re: indexing fields with multiplicity

2007-08-29 Thread Karl Wettin
29 aug 2007 kl. 19.13 skrev Tim Sturge: I'm looking for a boost when the anchor text is more commonly associated with one topic than another. For example the United States of America is called "USA" by a lot of people. The United Space Alliance is also called "USA" but by many less people.

Re: Highlighter that works with phrase and span queries

2007-08-29 Thread Mike Klaas
I just meant whether it would live in a lucene release (somewhere under contrib/) or just in JIRA. Would including the functionality in Solr help get it into lucene? -Mike On 29-Aug-07, at 4:58 AM, Mark Miller wrote: It kind of is a contrib -- its really just a new Scorer class (with som

Large Index Architecture

2007-08-29 Thread Michael J. Prichard
Hello All, I want to hear from those out there that have large (i.e. 50 GB+) indexes on how they have designed their architecture. I currently have an index for email that is 10 GB and growing. Right now there are no issues with it but I am about to get into an even bigger use for the softw

Re: indexing fields with multiplicity

2007-08-29 Thread Tim Sturge
That's exactly my question. I feel like for (i = 0 ; i < ; i++) { document.add(new Field("anchor","USA")); } is exactly equivalent to field = new Field("anchor","USA")); field.setBoost(); document.add(field); but I don't know the function that relates and . I feel like there

Re: indexing fields with multiplicity

2007-08-29 Thread Karl Wettin
29 aug 2007 kl. 21.37 skrev Tim Sturge: That's exactly my question. I feel like for (i = 0 ; i < ; i++) { document.add(new Field("anchor","USA")); } is exactly equivalent to field = new Field("anchor","USA")); field.setBoost(); document.add(field); but I don't know the function tha

How to speed-up index opening

2007-08-29 Thread Antoine Baudoux
Hello, I have an application with a 2GB index. A lot of documents (up to 10.000 per day) are added/deleted to this index. My customer would like to have a Maximum of 7 minutes delay between a media added to the system and its searchability through the index. So each 7 minutes or

Re: How to speed-up index opening

2007-08-29 Thread Chris Lu
Hi, Antoine, It does take a long time to open the index reader. One thing you could do is to put new documents into one smaller index and re-open it, it should be much faster. Also, you may need to have one index reader open, and open a new index reader, then close the previous index reader, to e

Re: Large Index Architecture

2007-08-29 Thread Chris Lu
Index Partitioning should be a good idea. It'll save a lot of time on index merging, incremental indexing. Just my experience, partition size really depends on CPU, hard disk speed, and memory size. Nowadays with Core 2 Duo, 10G size for each chunk should be good. -- Chris Lu ---

Can a Lucene field be renamed in a Lucene index?

2007-08-29 Thread George Aroush
Hi everyone, I have the following need and I wander what are my options or if anyone run into it and has a solution / suggestion. I'm indexing a SQL database. Each table is a Lucene index. Now, in table "A", I have a field called "Foo". When I index it into Lucene, I also end up with a field c

Re: Can a Lucene field be renamed in a Lucene index?

2007-08-29 Thread Erik Hatcher
there was just this thread here recently: hope that helps. Erik On Aug 29, 2007, at 10:03 PM, George Aroush wrote: Hi everyone, I have the following need and I wander what are my options or if

RE: Can a Lucene field be renamed in a Lucene index?

2007-08-29 Thread George Aroush
Just read the thread. Unfortunately, it doesn't offer a solution. Is it possible to write a tool that will read the source index, and write it to an output index with the field renamed? No, the raw-text is not stored in the Lucene index. Thanks. -- George > -Original Message- > From:

Re: Can a Lucene field be renamed in a Lucene index?

2007-08-29 Thread Chris Lu
The easiest solution would be to change the SQL to select Bar as Foo, ..., from your_table Use an alias and maintain everything as before. If it's not a solution, you may need to re-index everything. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Applic

Re: How to speed-up index opening

2007-08-29 Thread Michael Busch
Chris Lu wrote: > Hi, Antoine, > > It does take a long time to open the index reader. > One thing you could do is to put new documents into one smaller index and > re-open it, it should be much faster. > We're planning to add a reopen() method to IndexReader that should significantly speed up re

Reduce copy error

2007-08-29 Thread Nguyen Manh Tien
When i run nutch, i alway met this error in reduce task and is run very slow after this error. Do any one know how to solve this problem. Here is the log: java.io.IOException: Insufficient space at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem$InMemoryOutputStream.write(In