Migrating from Hit/Hits to TopDocs/TopDocCollector

2009-06-09 Thread Paul J. Lucas
I have existing code that's like: final Term t = /* ... */; final Iterator i = searcher.search( new TermQuery( t ) ).iterator(); while ( i.hasNext() ) { final Hit hit = (Hit)i.next(); // "FILE" is the field that recorded the original file indexed

RE: Reloading RAM Directory from updated FS Directory

2009-06-09 Thread Uwe Schindler
Hi Greg & Kay, > Lucene (IndexSearchers / IndexReaders) has the notion of cache as well > so you need to check if you really want a 100% replication of > RAMDirectory / FSDirectory as well concurrently in memory. Have you > tested with the FieldCache policies before moving onto the RAMDirectory >

Re: Reloading RAM Directory from updated FS Directory

2009-06-09 Thread Kay Kay
Have you checked out solr project that provides a service on top of Lucene + caching / warming up facilities similar to what you need. The IndexReaders are expensive ( and are the underlying data source for a given IndexSearcher ) in terms of time and resources , when being opened / created an

Using lucene in a clustered app server

2009-06-09 Thread Newman, Billy
I am trying to figure out the best way to add to a lucene index across a clustered app server. I cannot grab an IndexWriter for each node in the cluster, because I would run into lock file problems. I am not sure if I can share one IndexWriter across the cluster because what happens when two o

Reloading RAM Directory from updated FS Directory

2009-06-09 Thread Diamond, Greg
Hi All - What is the best way to load a RAM Directory from a FS Directory, and periodically reload the RAM Directory to pick up new documents? The scenario I have is I create several large directories which I create to a file system, then load them into ram for faster searching. They takes seve

Re: Using Lucene for Moderate Similarity Check..

2009-06-09 Thread Grant Ingersoll
Hi Ravi, Lucene can enable this, but you will have some work to do on top of it. If you search the archives for record linkage (http://www.lucidimagination.com/search/?q=record+linkage ) you will find a fair amount of discussion on this. Also, in somewhat shameless marketing mode, my co-au

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-09 Thread Yonik Seeley
I just cut'n'pasted your word into Solr... it worked fine (it didn't split the word). Make sure you're using the latest from the trunk version of Solr... this was fixed since 1.3 http://localhost:8983/solr/select?q=साल&debugQuery=true [...] साल साल text:साल text:साल -Yonik On Tue, Jun

Re: Using Luke on a Lucene Index in a Database

2009-06-09 Thread ChristophD
Upon a request on the experiences on this issue, I am posting the most important functions of the program. Every DB record maps directly to one file. The function that I did not include is "getDataSource()" which acquires a jdbc datasource to your database. cheers, Christoph private void

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-09 Thread KK
Hi Robert, I tried a sample code to check whats the reason. The worddelimiterfilter uses isLetter() method to tokenize, and for hindi words some parts of word are not actually letters but just part of the word[but that doesnot mean they can be used as word delimiters], since they are not letters is

Re: HitCollectorWrapper

2009-06-09 Thread Michael McCandless
Woops, you're right. I just fixed that (made it public). Thanks for raising this! Mike 2009/6/8 Koji Sekiguchi : > CHANGES.txt said that we can use HitCollectorWrapper: > > 12. LUCENE-1575: HitCollector is now deprecated in favor of a new > Collector abstract class. For easy migration, people c

Re: Debugging file lock problem

2009-06-09 Thread Michael McCandless
This doesn't exist today, but it'd be straightforward to implement your own LockFactory that is verbose? Mike On Fri, Jun 5, 2009 at 1:15 PM, Newman, Billy wrote: > I am having a problem where I am getting lock timeouts when trying to write > to my index file.  It would be nice if I could turn o

Re: 2.9 javadoc

2009-06-09 Thread Michael McCandless
http://lucene.apache.org/java/docs/nightly/ Mike On Mon, Jun 8, 2009 at 11:52 PM, Artyom Sokolov wrote: > Good time of day. > > If I understand correctly next release will be 2.9. Where one could > find javadocs for it? I've searched in Hudson a bit but didn't find > anything. > > Thanks. > > ---

Re: How to distribute lucene using rsync

2009-06-09 Thread Michael McCandless
Note that when using rsync, you must first close the IndexWriter, else the copy can be corrupt. If having to close IndexWriter (and stop indexing) is a hassle, then you should use SnapshotDeletionPolicy; it was created exactly for this reason (to take a backup of the index even while further index

Re: How to distribute lucene using rsync

2009-06-09 Thread Ian Lea
Unless you optimize it or are doing weird things with merge factors you won't get completely new files every time you update an index. Some files will change, or be created, or deleted, and some won't. Then you can just copy them wherever you want using rsync or whatever you like. We use rsync, ma

Using Lucene for Moderate Similarity Check..

2009-06-09 Thread RaviK Thakur
Hello All, I want to check the feasibility of using Lucene for similarity check between the two flat csv files. The actual requirement is like this: We have two files each containing the information of customers like their name, address, pin code etc. Some customers may be in common in both

Lucene Query Field Info

2009-06-09 Thread Shayak Sen
I construct a boolean query to search a term in each of the field of the index. Once I retrieve the hits, is it possible to retrieve which field matched to the particular term. For example: I have fields A B C with data a b c. A B C a b a Then I search for A:a B:a C:a and get a hit. Can I tell wh