list moving to lucene.apache.org

2005-03-01 Thread Roy T . Fielding
This list is about to be moved to java-user at lucene.apache.org. Please excuse the temporary inconvenience. Cheers, Roy T. Fielding, co-founder, The Apache Software Foundation ([EMAIL PROTECTED]) -

Re: Multiple indexes

2005-03-01 Thread Otis Gospodnetic
Ben, You do need to use a separate instance of those 3 classes for each index yes. But this is really something like: IndexWriter writer = new IndexWriter(); So it's normal code-writing process you don't really have to create anything new, just use existing Lucene API. As for locking,

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Doug Cutting
Yonik Seeley wrote: 6. Index locally and synchronize changes periodically. This is an interesting idea and bears looking into. Lucene can combine multiple indexes into a single one, which can be written out somewhere else, and then distributed back to the search nodes to replace their existing inde

Re: Multiple indexes

2005-03-01 Thread Ben
Is it true that for each index I have to create a seperate instance for FSDirectory, IndexWriter and IndexReader? Do I need to create a seperate locking mechanism as well? I have already implemented a program using just one index. Thanks, Ben On Tue, 1 Mar 2005 22:09:05 -0500, Erik Hatcher <[EMA

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Yonik Seeley
> 6. Index locally and synchronize changes periodically. This is an > interesting idea and bears looking into. Lucene can combine multiple > indexes into a single one, which can be written out somewhere else, and > then distributed back to the search nodes to replace their existing > index. This i

Re: Multiple indexes

2005-03-01 Thread Erik Hatcher
It's hard to answer such a general question with anything very precise, so sorry if this doesn't hit the mark. Come back with more details and we'll gladly assist though. First, certainly do not copy/paste code. Use standard reuse practices, perhaps the same program can build the two differen

RE: How to manipulate the lucene index table

2005-03-01 Thread Kyong Kwak
You can try Luke http://www.getopt.org/luke/ -Original Message- From: Srimant Mishra [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 01, 2005 4:39 PM To: lucene-user@jakarta.apache.org Subject: How to manipulate the lucene index table Hi all, I have a web-based application

How to manipulate the lucene index table

2005-03-01 Thread Srimant Mishra
Hi all, I have a web-based application that we use to index text documents as well as images; the indexes fields are either Field.Unstored or Field.Keyword. Currently, we plan to modify some of the index field names. For example, if the index field name was DOCLOCALE,

Re: Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Chris Hostetter
: We have a requirement for a new version of our software that it run in a : clustered environment. Any node should be able to go down but the : application must keep functioning. My application is looking at similar problems. We aren't yet live, but the only practicle solution we have implimente

Multiple indexes

2005-03-01 Thread Ben
Hi My site has two types of documents with different structure. I would like to create an index for each type of document. What is the best way to implement this? I have been trying to implement this but found out that 90% of the code is the same. In Lucene in Action book, there is a case study

Re: Fast access to a random page of the search results.

2005-03-01 Thread Doug Cutting
Daniel Naber wrote: After fixing this I can reproduce the problem with a local index that contains about 220.000 documents (700MB). Fetching the first document takes for example 30ms, fetching the last one takes >100ms. Of course I tested this with a query that returns many results (about 50.000

Best Practices for Distributing Lucene Indexing and Searching

2005-03-01 Thread Luke Francl
Lucene Users, We have a requirement for a new version of our software that it run in a clustered environment. Any node should be able to go down but the application must keep functioning. Currently, we use Lucene on a single node but this won't meet our fail over requirements. If we can't find a

RE: Investingating Lucene For Project

2005-03-01 Thread Runde, Kevin
Also there is a book called "Lucene in Action" that was released recently. It is a great introduction to Lucene and has sections dedicated to indexing different text document types (txt, html, pdf, doc, rtf). FYI I am in no way related to the book or the authors so this is a real recommendation. It

Re: Investingating Lucene For Project

2005-03-01 Thread Ben Litchfield
See inlined comments below. > We have had requests from some clients who would like the ability to > "index" PDF files, now and possibly other text files in the future. The > PDF files live on a server and are in a structured environment. I would > like to somehow index the content inside the PD

Investingating Lucene For Project

2005-03-01 Thread Scott Purcell
I am looking for a solution to a problem I am having. We have a web-based asset management solution where we manage customers assets. We have had requests from some clients who would like the ability to "index" PDF files, now and possibly other text files in the future. The PDF files live on

Re: Zip Files

2005-03-01 Thread Chris Lamprecht
Luke, Look at the javadocs for java.io.ByteArrayInputStream - it wraps a byte array and makes it accessible as an InputStream. Also see java.util.zip.ZipFile. You should be able to read and parse all contents of the zip file in memory. http://java.sun.com/j2se/1.4.2/docs/api/java/io/ByteArrayIn

Re: Fast access to a random page of the search results.

2005-03-01 Thread Daniel Naber
On Tuesday 01 March 2005 19:15, Doug Cutting wrote: > 'nHits - nHits' always equals zero. ÂSo you're actually printing the > first document, not the last. ÂThe last document would be accessed with > 'hits.doc(nHits)'. After fixing this I can reproduce the problem with a local index that contains

Re: Custom filters & document numbers

2005-03-01 Thread Doug Cutting
[EMAIL PROTECTED] wrote: Does this happen frequently? Like Stanislav has been asking... what sort of operations on the index cause the document number to change for any given document? Documents are only re-numbered after there have been deletions. Once there have been deletions, renumbering may

Re: Fast access to a random page of the search results.

2005-03-01 Thread Doug Cutting
Stanislav Jordanov wrote: startTs = System.currentTimeMillis(); dummyMethod(hits.doc(nHits - nHits)); stopTs = System.currentTimeMillis(); System.out.println("Last doc accessed in " + (stopTs - startTs)

RE: Zip Files

2005-03-01 Thread Crump, Michael
Not sure what you are using as your indexing classes but if you changed them to use InputStream I think it would go a long way towards making them more flexible and solving your problem. > -Original Message- > From: Luke Shannon [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 01, 2005 1

Re: 1.4.x TermInfosWriter.indexInterval not public static ?

2005-03-01 Thread Doug Cutting
Kevin A. Burton wrote: BTW.. can you define "a bit"... Merriam-Webster says: a bit : SOMEWHAT, RATHER Is "a bit" 5%? 10%? Benchmarks would be ncie but I'm not that picky. If you want benchmarks, make benchmarks. I just want to see what performance hits/benefits I could see by tweaking the va

Re: Zip Files

2005-03-01 Thread Luke Shannon
Thanks Ernesto. The issue I'm working with now (this is more lack of experience than anything) is getting an input I can index. All my indexing classes (doc, pdf, xml, ppt) take a File object as a parameter and return a Lucene Document containing all the fields I need. I'm struggling with how I c

Large Index managing

2005-03-01 Thread Volodymyr Bychkoviak
Hi, just an idea how to manage large index that is updated very often. Very often there is need to update an document in index. To update document in index you should delete old document from index and then add new one. In most cases it require you to open IndexReader, delete document, close Ind

Re: Zip Files

2005-03-01 Thread Ernesto De Santis
Hello first, you need a parser for each file type: pdf, txt, word, etc. and use a java api to iterate zip content, see: http://java.sun.com/j2se/1.4.2/docs/api/java/util/zip/ZipInputStream.html use getNextEntry() method little example: ZipInputStream zis = new ZipInputStream(fileInputStream); ZipEn

Zip Files

2005-03-01 Thread Luke Shannon
Hello; Anyone have an ideas on how to index the contents within zip files? Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Remove document fails

2005-03-01 Thread Volodymyr Bychkoviak
may be you have open IndexWriter at the same time you are trying to delete document. Alex Kiselevski wrote: Hi, I have a problem doing IndexReader.delete(int doc) and it fails on lock error. Alex Kiselevski +9.729.776.4346 (desk) +9.729.776.1504 (fax) AMDOCS > INTEGRATED CUSTOMER MANAGEMENT T

RE: Re[2]: Is IndexSearcher thread safe?

2005-03-01 Thread Cocula Remi
I probably had the same trouble (but I'm not sure). I have run a test programm that was creating a lot of IndexSearchers (but also close and free them). It went to an outOfMemory Exception. But i'm not finished with that problem (need to use a profiler). >But I have discovered one strange fact

RE: Is IndexSearcher thread safe?

2005-03-01 Thread Cocula Remi
>Additional question. >If I'm sharing one instance of IndexSearcher between different threads >Is it good to just to drop this instance to GC. >Because I don't know if some thread is still using this searcher or done >with it. Note that as far as one of the threads keep a reference on the Inde

Remove document fails

2005-03-01 Thread Alex Kiselevski
Hi, I have a problem doing IndexReader.delete(int doc) and it fails on lock error. Alex Kiselevski +9.729.776.4346 (desk) +9.729.776.1504 (fax) AMDOCS > INTEGRATED CUSTOMER MANAGEMENT The information contained in this message is proprietary of Amdocs, protected from disclosure, and may be

RE: help with boolean expression

2005-03-01 Thread Omar Didi
I found something kind fo weird about the way lucene interprets boolean expressions wihout parenthesis. when i run the query A AND B OR C, it returns only the documents that have A(in other words as if the query was just the term A). when I run the query A OR B AND C, it returns only the documen

Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]

2005-03-01 Thread Jonathan O'Connor
Apologies Erik, This must be one of those apostrophe in email address problems I always get. Recently I removed the apostrophe from the email address I give out. Our server recognizes both email addresses, but some of these mail lists don't like the O'Connor clann! Ciao, Jonathan O'Connor XCOM Dubl

Re[2]: Is IndexSearcher thread safe?

2005-03-01 Thread Yura Smolsky
Hello, Volodymyr. VB> Additional question. VB> If I'm sharing one instance of IndexSearcher between different threads VB> Is it good to just to drop this instance to GC. VB> Because I don't know if some thread is still using this searcher or done VB> with it. It is safe to share one instance betw

Re: Custom filters & document numbers

2005-03-01 Thread tomsdepot-lucene
I'm also interested in knowing what can change the doc numbers. Does this happen frequently? Like Stanislav has been asking... what sort of operations on the index cause the document number to change for any given document? If the document numbers change frequently, is there a straightforward wa

Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]

2005-03-01 Thread Erik Hatcher
I had to moderate both Jonathan and Jon's messages in to the list. Please subscribe to the list and post to it with the address you've subscribed. I cannot always guarantee I'll catch moderation messages and send them through in a timely fashion. Erik On Mar 1, 2005, at 6:18 AM, Jonat

Re: Questions about GermanAnalyzer/Stemmer [auf Viren geprueft]

2005-03-01 Thread Jonathan O'Connor
Jon, I too found some problems with the German analyser recently. Here's what may help: 1. You can try reading Joerg Caumanns' paper "A Fast and Simple Stemming Algorithm for German Words". This paper describes the algorithm implemented by GermanAnalyser. 2. I guess German nouns all capitalized, so

Re: Is IndexSearcher thread safe?

2005-03-01 Thread Volodymyr Bychkoviak
Additional question. If I'm sharing one instance of IndexSearcher between different threads Is it good to just to drop this instance to GC. Because I don't know if some thread is still using this searcher or done with it. Regards, Volodymyr Bychkoviak Volodymyr Bychkoviak wrote: Is it thread-saf

Is IndexSearcher thread safe?

2005-03-01 Thread Volodymyr Bychkoviak
Is it thread-safe to share one instance of IndexSearcher between multiple threads? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Questions about GermanAnalyzer/Stemmer

2005-03-01 Thread Jon Humble
Hello, We’re using the GermanAnalyzer/Stemmer to index/search our (German) Website. I have a few questions: (1) Why is the GermanAnalyzer case-sensitive? None of the other language indexers seem to be. What does this feature add? (2) With the German Analyzer, wildcard searches containin

Re: Fast access to a random page of the search results.

2005-03-01 Thread Stanislav Jordanov
// The test source code (second attempt). // Just in case the .txt attachment does not pass through // I am pasting the code here: package index_test; import org.apache.lucene.search.*; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.Directory; import org.apache.lucene.