Re: OutOfMemoryError with Lucene 1.4 final

2004-12-10 Thread Justin Swanhart
You probably need to increase the amount of RAM available to your JVM. See the parameters: -Xmx :Maximum memory usable by the JVM -Xms :Initial memory allocated to JVM My params are; -Xmx2048m -Xms128m (2G max, 128M initial) On Fri, 10 Dec 2004 11:17:29 -0600, Sildy Augustine <[EMAIL PR

Re: partial updating of lucene

2004-12-08 Thread Justin Swanhart
You unstored fields were not stored in the index, only their terms were stored. When you get the document from the index and modify it, those terms are lost when you add the document again. You can either simply create a new document and populate all the fields and add that document to the index,

corrupted index

2004-12-04 Thread Justin Swanhart
Somehow today one of my indexes became corrupted. I get the following IO exception when trying to open the index: Exception in thread "main" java.io.IOException: read past EOF at org.en.lucene.store.InputStream.refill(InputStream.java:154) at org.en.lucene.store.InputStream.readB

Re: Thread safety

2004-12-03 Thread Justin Swanhart
You can only have one open writer at a time. A writer is either an IndexWriter object, or an IndexReader object that has modified the index, by deleting documents for instance. You must close your existing writer before you open a new one. You should not get lock exceptions with IndexSearchers.

Re: What is the best file system for Lucene?

2004-11-30 Thread Justin Swanhart
As a generalisation, SuSE itself is not a lot slower than Windows XP. I also very much doubt that filesystem is a factor. If you want to test w/out filesystem involvement, simply load your index into a RAMDirectory instead of using FSDirectory. That precludes filesystem overhead in searches. Th

Re: What is the best file system for Lucene?

2004-11-30 Thread Justin Swanhart
On Tue, 30 Nov 2004 12:07:46 -, Pete Lewis <[EMAIL PROTECTED]> wrote: > Also, unless you take your hyperthreading off, with just one index you are > searching with just one half of the CPU - so your desktop is actually using > a 1.5GHz CPU for the search. So, taking account of this its not too

Re: Index in RAM - is it realy worthy?

2004-11-28 Thread Justin Swanhart
My indexes are stored on a NetApp filter via NFS. The indexer process updates the indexes over NFS. I have multiple indexes. My search process determines if the nfs indexes have been updated, and if they have, then loads the index into a RAMDirectory. RAMDirectory is of course much faster than

Re: java.io.FileNotFoundException: ... (No such file or directory)

2004-11-19 Thread Justin Swanhart
BoundsException ioobe) > { > logger.error("INDEX OUT OF BOUNDS!" + ioobe.getMessage()); > ioobe.printStackTrace(); > } > } > reader.close(); > //logger.debug("done, about the optimize"); >

java.io.FileNotFoundException: ... (No such file or directory)

2004-11-18 Thread Justin Swanhart
I have two index processes. One is an index server, the other is a search server. The processes run on different machines. The index server is a single threaded process that reads from the database and adds unindexed rows to the index as needed. It sleeps for a couple minutes between each batch

Re: version documents

2004-11-17 Thread Justin Swanhart
Split the filename into "basefilename" and "version" and make each a keyword. Sort your query by version descending, and only use the first "basefile" you encounter. On Wed, 17 Nov 2004 15:05:19 -0500, Luke Shannon <[EMAIL PROTECTED]> wrote: > Hey all; > > I have ran into an interesting case. >

Re: Something missing !!???

2004-11-17 Thread Justin Swanhart
The HEAD version of CVS supports gz compression. You will need to check it out using cvs if you want to use it. On Wed, 17 Nov 2004 21:43:36 +0200, abdulrahman galal <[EMAIL PROTECTED]> wrote: > i noticed in the last period that alot of people disscus with each others > about the bugs of lucene

Re: Index copy

2004-11-17 Thread Justin Swanhart
You could lock your index for writes, then copy the file using operating system copy commands. Another way would be to lock your index, make a filesystem snapshot, then unlock your index. You can then safely copy the snapshot without interupting further index operations. On Wed, 17 Nov 2004 11:2

Re: QueryParser: "[stopword] AND something" throws Exception

2004-11-12 Thread Justin Swanhart
Try using 1.4.2. The change file says that ArrayIndexOutOfBoundsExceptions have been fixed in the queryparser. On Fri, 12 Nov 2004 12:04:31 -0500, Will Allen <[EMAIL PROTECTED]> wrote: > Holy cow! This does happen! > > > > -Original Message- > From: Peter Pimley [mailto:[EMAIL PROTEC

Re: Searching in keyword field ?

2004-11-09 Thread Justin Swanhart
You can add the category keyword multiple times to a document. Instead of seperating your categories with a delimiter, just add the keyword multiple times. doc.add(Field.Keyword("category", "ABC"); doc.add(Field.Keyword("category", "DEF GHI"); On Tue, 9 Nov 2004 17:18:19 +0100, Thierry Ferrero (

Re: IndexSearch

2004-11-08 Thread Justin Swanhart
You can write to the index and read from it at the same time. You can only have one IndexWriter open at any one time. IndexSearchers will only see documents that were created before they were instantiated, so you need to create new ones periodically to see new documents. On Mon, 8 Nov 2004 14:26

Re: Windows Bug?

2004-11-08 Thread Justin Swanhart
The reason this is failing is because you are trying to create a new index in the directory. It works on *nix file systems because you can delete an open file on those operating systems, something you can't do under Windows. If you change the create parameter to false on your second call everythi

Re: Is there an easy way to have indexing ignore a CVS subdirectory in the index directory?

2004-11-05 Thread Justin Swanhart
You may also want to investigate the CVSIGNORE environment variable. You can tell CVS to ignore any files and directories specified in this variable (it is space seperated) So you could tell CVS to ignore all directories named lucene with: export CVSIGNORE=lucene On Fri, 5 Nov 2004 09:03:00 -0

Re: Is there an easy way to have indexing ignore a CVS subdirectory in the index directory?

2004-11-05 Thread Justin Swanhart
You should exclude your lucene index from the CVS repository. This is the same thing you would do if you had a process that generated files in your source tree from other files. The generated files wouldn't have any meaning in the repository, and can be regenerated at any time, so you would want

prefix wildcard matching options (*blah)

2004-11-04 Thread Justin Swanhart
I'm thinking about making a seperate field in my index for prefix wildcard searches. I would chop off x characters from the front to create "subtokens" for the prefix matches. For the term: republican terms created: republican epublican publican ublican blican My query parser would then intellige

Re: one huge index or many small ones?

2004-11-04 Thread Justin Swanhart
First off, I think you should make a decision about what you want to store in your index and how you go about searching it. The less information you store in your index, the better, for performance reasons. If you can store the messages in an external database you probably should. I would create

Re: Search speed

2004-11-02 Thread Justin Swanhart
If you know all the phrases your are going to search for, you could modify an analyzer to make those phrases into whole terms when you are analyzing. Other than that, you can test the speed of breaking the phrase query up into term queries. You would have to do an AND on all the words in the phra

When do document ids change

2004-10-29 Thread Justin Swanhart
Given an FSDirectory based index A. Documents are added to A with an IndexWriter minMergeDocs = 2 mergeFactor = 3 Documents are never deleted. Once the RAMDirectory merges documents to the index: a) will the documentID values for index A ever change? b) can a mapping between a term in th

Re: Searching for a phrase that contains quote character

2004-10-28 Thread Justin Swanhart
absolutely correct. sorry about that. shouldn't code before coffee :) On Thu, 28 Oct 2004 20:16:16 +0200, Daniel Naber <[EMAIL PROTECTED]> wrote: > On Thursday 28 October 2004 19:03, Justin Swanhart wrote: > > > Have you tried making a term query by hand and testing

Re: Searching for a phrase that contains quote character

2004-10-28 Thread Justin Swanhart
Have you tried making a term query by hand and testing to see if it works? Term t = new Term("field", "this is a \"test\""); PhraseQuery pq = new PhraseQuery(t); ... On Thu, 28 Oct 2004 12:02:48 -0400, Will Allen <[EMAIL PROTECTED]> wrote: > > I am having this same problem, but cannot find a

Re: Stopwords in Exact phrase

2004-10-27 Thread Justin Swanhart
your analyzer will have removed the stopword when you indexed your documents, so lucene won't be able to do this for you. You will need to implement a second pass over the results returned by lucene and check to see if the stopword is included, perhaps with String.indexOf() On Wed, 27 Oct 2004 1

Re: IndexWriter Constructor question

2004-10-27 Thread Justin Swanhart
You could always modify your own local copy if you want to change the behavior of the parameter. or just do: IndexWriter w = new IndexWriter(indexDirectory, new StandardAnalyzer(), !(IndexReader.indexEx

Re: Backup strategies

2004-10-27 Thread Justin Swanhart
I would suggest that you create a lock file for your index writing process, if the lock file is encountered close the IndexWriter until the lock file is removed. After you create the lockfile, wait a few seconds to make sure the writer process has quiesced, then create a snapshot of the filesystem

Re: Efficient search on lucene mailing archives

2004-10-14 Thread Justin Swanhart
you could just request all the messages from the list bot, then index them with lucene :) On Thu, 14 Oct 2004 16:50:19 +, sam s <[EMAIL PROTECTED]> wrote: > Hi Folks, > Is there any place where I can do a better search on lucene mailing > archives? > I tried JGuru and looks like their search

Re: Multi + Parallel

2004-10-14 Thread Justin Swanhart
The overhead of creating that many searcher objects is going to far outweigh any performance benefit you could possibly hope to gain by splitting your index up. On Thu, 14 Oct 2004 04:42:27 -0700 (PDT), Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Search a single merged index. > > Otis > > >

Re: Indexing Strategy for 20 million documents

2004-10-08 Thread Justin Swanhart
It depends on a lot of factors. I myself use multiple indexes for about 10M documents. My documents are transient. Each day I get about 400K and I remove about 400K. I always remove an entire days documents at one time. It is much faster/easier to delete the lucene index for the day that I am r

Re: Analyzer reuse

2004-10-07 Thread Justin Swanhart
Yes you can reuse analyzers. The only performance gain will come from not having to create the objects and not having garbage collection overhead. I create one for each of my index reading threads. On Thu, 07 Oct 2004 16:59:38 +, sam s <[EMAIL PROTECTED]> wrote: > Hi, > Can instance of an an

multiple threads

2004-10-01 Thread Justin Swanhart
As I understand it, if two writers try to acess the same index for writing, then one of the writers should block waiting for a lock until the lock timeout period expires, and then they will return a "Lock wait timeout" exception. I have a multithreaded indexing applications that writes into one of