Re: Disk space used by optimize

2005-02-06 Thread Morus Walter
Bernhard Messer writes: However, three times the space sounds a bit too much, or I make a mistake in the book. :) there already was a discussion about disk usage during index optimize. Please have a look to the developers list at: http://mail-archives.apache.org/eyebrowse/[EMAIL

Re[2]: Disk space used by optimize

2005-02-04 Thread Yura Smolsky
Hello, Doug. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like

Re: Optimize not deleting all files

2005-02-04 Thread Ernesto De Santis
Hi all We have the same problem. We guess that the problem is that windows lock files. Our enviroment: Windows 2000 Tomcat 5.5.4 Ernesto. [EMAIL PROTECTED] escribió: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My

Re: Disk space used by optimize - non space in disk corrupts index.

2005-02-04 Thread Ernesto De Santis
Hi all We have a big index and a little space in disk. When optimize and all space is consumed, our index is corrupted. segments file point to nonexistent files. Enviroment: java 1.4.2_04 W2000 SP4 Tomat 5.5.4 Bye, Ernesto. Yura Smolsky escribió: Hello, Otis. There is a big difference when you use

Re: Disk space used by optimize

2005-02-04 Thread Bernhard Messer
However, three times the space sounds a bit too much, or I make a mistake in the book. :) there already was a discussion about disk usage during index optimize. Please have a look to the developers list at: http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569 http

Re: Optimize not deleting all files

2005-02-04 Thread Otis Gospodnetic
Get and try Lucene 1.4.3. One of the older versions had a bug that was not deleting old index files. Otis --- [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding

Re: Optimize not deleting all files

2005-02-04 Thread yahootintin . 1247688
] escribiꊾ Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene

Re: Optimize not deleting all files

2005-02-04 Thread Patricio Keilty
guess that the problem is that windows lock files. Our enviroment: Windows 2000 Tomcat 5.5.4 Ernesto. [EMAIL PROTECTED] escribi Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new

Re: Optimize not deleting all files

2005-02-04 Thread Patricio Keilty
Hi Otis, tried version 1.4.3 without success, old index files still remain in the directory. Also tried not calling optimize(), and still getting the same behaviour, maybe our problem is not related to optimize() call at all. --p Otis Gospodnetic wrote: Get and try Lucene 1.4.3. One

Re: Optimize not deleting all files

2005-02-04 Thread Steven Rowe
of your IndexReaders. Hope it helps, Steve Patricio Keilty wrote: Hi Otis, tried version 1.4.3 without success, old index files still remain in the directory. Also tried not calling optimize(), and still getting the same behaviour, maybe our problem is not related to optimize() call at all. --p

Re: Optimize not deleting all files

2005-02-04 Thread yahootintin . 1247688
the optimize during the night). After the optimize is completed, I close and re-open the readers so they start reading from the new index files. I'm thinking of adding code to delete all the old files at that point. I presume they will no longer be locked. --- Lucene Users List lucene-user

Optimize not deleting all files

2005-02-03 Thread yahootintin . 1247688
Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an optimize will create new index files and all existing index files should be deleted. Is this correct? We are running Lucene 1.4.2 on Windows

Re: Optimize not deleting all files

2005-02-03 Thread
Your understanding is right! The old existing files should be deleted,but it will build new files! On Thu, 03 Feb 2005 17:36:27 -0800 (PST), [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory

Re: Disk space used by optimize

2005-01-31 Thread Doug Cutting
Yura Smolsky wrote: There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like

Re[2]: Disk space used by optimize

2005-01-30 Thread Yura Smolsky
Hello, Otis. There is a big difference when you use compound index format or multiple files. I have tested it on the big index (45 Gb). When I used compound file then optimize takes 3 times more space, b/c *.cfs needs to be unpacked. Now I do use non compound file format. It needs like twice

Re: Disk space used by optimize

2005-01-28 Thread Otis Gospodnetic
Morus, that description of 3 sets of index files is what I was imagining, too. I'll have to test and add to the book errata, it seems. Thanks for the info, Otis --- Morus Walter [EMAIL PROTECTED] wrote: Otis Gospodnetic writes: Hello, Yes, that is how optimize works - copies all

Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down

Re: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You

RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index

RE: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Have you tried using the multifile index format? Now I wonder if there is actually a difference in disk space cosumed by optimize() when you use multifile and compound index format... Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three

Re: how often to optimize?

2004-12-28 Thread aurora
Are not optimized indices causing you any problems (e.g. slow searches, high number of open file handles)? If no, then you don't even need to optimize until those issues become... issues. OK I have changed the process to not doing optimize() at all. So far so good. The number of files hover

Re: how often to optimize?

2004-12-28 Thread Otis Gospodnetic
of open file handles)? If no, then you don't even need to optimize until those issues become... issues. OK I have changed the process to not doing optimize() at all. So far so good. The number of files hover from 10 to 40 during the indexing of 10,000 files. Seems Lucene is doing

how often to optimize?

2004-12-21 Thread aurora
Right now I am incrementally adding about 100 documents to the index a day and then optimize after that. I find that optimize essentially rebuilding the entire index into a single file. So the size of disk write is proportion to the total index size, not to the size of documents

Re: how often to optimize?

2004-12-21 Thread Otis Gospodnetic
Hello, I think some of these questions my be answered in the jGuru FAQ So my question is would it be an overkill to optimize everyday? Only if lots of documents are being added/deleted, and you end up with a lot of index segments. Is there any guideline on how often to optimize

Re: finalize delete without optimize

2004-12-14 Thread Otis Gospodnetic
Hello John, Once you make your change locally, use 'cvs diff -u IndexWriter.java indexwriter.patch' to make a patch. Then open a new Bugzilla entry. Finally, attach your patch to that entry. Note that Document deletion is actually done from IndexReader, so your patch may have to be on

Re: finalize delete without optimize

2004-12-14 Thread Otis Gospodnetic
Hello John, I believe you didn't get any replies to this. What you are describing cannot be done using the public, but maaay (no source code on this machine, so I can't double-check that) be doable if you use some of the 'internal' methods. I don't have the need for this, but others might, so

Re: finalize delete without optimize

2004-12-14 Thread John Wang
Hi Otis: Thanks for you reply. I am looking for more of an API call than a tool. e.g. IndexWriter.finalizeDelete() If I implement this, how would I go about submitting a patch? thanks -John On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic [EMAIL PROTECTED] wrote:

RE: finalize delete without optimize

2004-12-09 Thread Aviran
Lucene standard API does not support this kind of operation. Aviran http://www.aviransplace.com -Original Message- From: John Wang [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 08, 2004 17:32 PM To: [EMAIL PROTECTED] Subject: Re: finalize delete without optimize Hi folks

finalize delete without optimize

2004-12-06 Thread John Wang
Hi: Is there a way to finalize delete, e.g. actually remove them from the segments and make sure the docIDs are contiguous again. The only explicit way to do this is by calling IndexWriter.optmize(). But this call does a lot more (also merges all the segments), hence is very expensive. Is

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Daniel Naber
On Tuesday 23 November 2004 00:06, Kevin A. Burton wrote: I'm wondering about the potential for a generic JDBCDirectory for keeping the lucene index within a database. Such a thing already exists: http://ppinew.mnis.com/jdbcdirectory/, but I don't know about its scalability. Regards Daniel

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
optimizations so if you add 10 documents you have to run optimize() again and this isn't exactly a fast operation. I'm wondering about the potential for a generic JDBCDirectory for keeping the lucene index within a database. It sounds somewhat unconventional would allow you to perform live addDirectory

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Kevin A. Burton
Erik Hatcher wrote: Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. I assume this would prevent prefix queries from working... Kevin -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you

Re: JDBCDirectory to prevent optimize()?

2004-11-23 Thread Erik Hatcher
On Nov 23, 2004, at 6:02 PM, Kevin A. Burton wrote: Erik Hatcher wrote: Also, there is a DBDirectory in the sandbox to store a Lucene index inside Berkeley DB. I assume this would prevent prefix queries from working... Huh? Why would you assume that? As far as I know, and I've tested this

JDBCDirectory to prevent optimize()?

2004-11-22 Thread Kevin A. Burton
It seems that when compared to other datastores that Lucene starts to fall down. For example lucene doesn't perform online index optimizations so if you add 10 documents you have to run optimize() again and this isn't exactly a fast operation. I'm wondering about the potential for a generic

using optimize and addDocument concurrently.

2004-10-19 Thread Stephen Halsey
problem is that the index needs to be continually kept up to date with new news articles, but also needs to be regularly optimized to keep it fast. If I cannot update and optimize one index at the same time the best way I can see of doing this is maintaining multiple identical indexes and offlining

RE: using optimize and addDocument concurrently.

2004-10-19 Thread Aad Nales
time. Our problem is that the index needs to be continually kept up to date with new news articles, but also needs to be regularly optimized to keep it fast. If I cannot update and optimize one index at the same time the best way I can see of doing this is maintaining multiple identical indexes

IndexReader.close() semantics and optimize -- Re: problem with locks when updating the data of a previous stored document

2004-09-16 Thread David Spencer
to save a deletion? Save a pending one, or commit it (commit - really delete it)? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close() Also optimize doesn't mention deletions. http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html

Re: Way to repair an index broking during 1/2 optimize?

2004-07-09 Thread Doug Cutting
Kevin A. Burton wrote: With the typical handful of fields, one should never see more than hundreds of files. We only have 13 fields... Though to be honest I'm worried that even if I COULD do the optimize that it would run out of file handles. Optimization doesn't open all files at once

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Peter M Cipollone
Subject: Way to repair an index broking during 1/2 optimize? So.. the other day I sent an email about building an index with 14M documents. That went well but the optimize() was taking FOREVER. It took 7 hours to generate the whole index and when complete as of 10AM it was still optimizing (6

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
optimize(). Please tell us how it goes. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Peter M Cipollone wrote: You might try merging the existing index into a new index located on a ram disk. Once it is done, you can move the directory from ram disk back to your hard disk. I think this will work as long as the old index did not finish merging. You might do a strings command on

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote: Kevin A. Burton wrote: Also... what can I do to speed up this optimize? Ideally it wouldn't take 6 hours. Was this the index with the mergeFactor of 5000? If so, that's why it's so slow: you've delayed all of the work until the end. Indexing on a ramfs will make things

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
. I'm worried about duplicate or missing content from the original index. I'd rather rebuild the index and waste another 6 hours (I've probably blown 100 hours of CPU time on this already) and have a correct index :) During an optimize I assume Lucene starts writing to a new segment and leaves all

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote: No... I changed the mergeFactor back to 10 as you suggested. Then I am confused about why it should take so long. Did you by chance set the IndexWriter.infoStream to something, so that it logs merges? If so, it would be interesting to see that output, especially the last

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote: Kevin A. Burton wrote: No... I changed the mergeFactor back to 10 as you suggested. Then I am confused about why it should take so long. Did you by chance set the IndexWriter.infoStream to something, so that it logs merges? If so, it would be interesting to see that output,

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote: During an optimize I assume Lucene starts writing to a new segment and leaves all others in place until everything is done and THEN deletes them? That's correct. The only settings I uses are: targetIndex.mergeFactor=10; targetIndex.minMergeDocs=1000; the resulting index has

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
of fields, one should never see more than hundreds of files. We only have 13 fields... Though to be honest I'm worried that even if I COULD do the optimize that it would run out of file handles. This is very strange... I'm going to increase minMergeDocs to 1 and then run the full converstion on one

addIndexes and optimize

2004-07-07 Thread roy-lucene-user
Hey y'all again, Just wondering why the IndexWriter.addIndexes method calls optimize before and after it starts merging segments together. We would like to create an addIndexes method that doesn't optimize and call optimize on the IndexWriter later. Roy

Re: optimize() is not merging into single file? !!!!!!

2004-06-02 Thread iouli . golovatyi
? Category: I optimize and close the index after that, but don't get just one .cvs file as it promised in doc. Instead of it I see something like small segments and a couple of big. This weird behavor seems started since i changed from v 1.4-rc2 to 1.4-rc3. Before I got just one cvs segment

RE : optimize() is not merging into single file? !!!!!!

2004-06-02 Thread Rasik Pandey
Hello, I am running a two-week old version of Lucene from the CVS HEAD and seeing the same behavior.? Regards, RBP -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Envoy : mercredi 2 juin 2004 13:53 : Lucene Users List Objet : Re: optimize

Re: optimize fails with Negative seek offset

2004-05-12 Thread Sascha Ottolski
Hi, sorry for following up my own mail, but since no one responded so far, I thought the stacktrace might be of interested. The following exception always occurs when trying to optimize one of our indizes, which always went ok for about a year now. I just tried with 1.4-rc3, but with the same

Re: optimize fails with Negative seek offset

2004-05-12 Thread Anthony Vito
trying to optimize one of our indizes, which always went ok for about a year now. I just tried with 1.4-rc3, but with the same result: java.io.IOException: Negative seek offset at java.io.RandomAccessFile.seek(Native Method) at org.apache.lucene.store.FSInputStream.readInternal

Re: optimize fails with Negative seek offset

2004-05-12 Thread Sascha Ottolski
Am Mittwoch, 12. Mai 2004 18:54 schrieb Anthony Vito: Looks like the same error I got when I tried to use Lucene version 1.3 to search on an index I had created with Lucene version 1.4. The versions are not forward compatible. Did you by chance create the index with version 1.4 and are now

Memory requirements for optimize() on compound index high?

2004-05-10 Thread David Sitsky
to the JVM. For indexing 1,000,000 complex documents, with potentially around 30 fields each, this seems to work fine. I have noticed that when performing an optimize() on this index at the end of a batch load, the memory requirements seem to be much higher. I was receiving OutOfMemoryErrors

optimize fails with Negative seek offset

2004-05-04 Thread Sascha Ottolski
Hi, I have no idea where to look for, and I know almost nothing about java :-( We're using lucene quite a while now (about a year I guess) and suddenly I've seen this when trying to optimize the index: java.lang.Exception: java.io.IOException: Negative seek offset The code throwing

Preventing duplicate document insertion during optimize

2004-04-30 Thread Kevin A. Burton
Let's say you have two indexes each with the same document literal. All the fields hash the same and the document is a binary duplicate of a different document in the second index. What happens when you do a merge to create a 3rd index from the first two? I assume you now have two documents

Re: Preventing duplicate document insertion during optimize

2004-04-30 Thread James Dunn
Kevin, I have a similar issue. The only solution I have been able to come up with is, after the merge, to open an IndexReader against the merge index, iterate over all the docs and delete duplicate docs based on my primary key field. Jim --- Kevin A. Burton [EMAIL PROTECTED] wrote: Let's say

Re: Will failed optimize corrupt an index?

2003-08-20 Thread Doug Cutting
The index should be fine. Lucene index updates are atomic. Doug Dan Quaroni wrote: My index grew about 7 gigs larger than I projected it would, and it ran out of disk space during optimize. Does lucene have transactions or anything that would prevent this from corrupting an index, or do I need

Will failed optimize corrupt an index?

2003-08-19 Thread Dan Quaroni
My index grew about 7 gigs larger than I projected it would, and it ran out of disk space during optimize. Does lucene have transactions or anything that would prevent this from corrupting an index, or do I need to generate the index again? Thanks

RE: Will failed optimize corrupt an index?

2003-08-19 Thread Pasha Bizhan
HI, From: Dan Quaroni [mailto:[EMAIL PROTECTED] My index grew about 7 gigs larger than I projected it would, and it ran out of disk space during optimize. Does lucene have transactions or anything that would prevent this from corrupting an index, or do I need to generate the index

RE: Files getting deleted when optimize is killed?

2003-07-14 Thread Steve Rajavuori
Upon further examination what I found is this: - Killing the process while optimize() is still working does NOT cause the index files to be deleted, HOWEVER -- - Once the index is opened again by a new process (now apparently in an unstable state due to the incomplete optimize()), at that time

Re: Files getting deleted when optimize is killed?

2003-07-12 Thread Otis Gospodnetic
is this: My code will automatically call optimize( ) periodically. Because the index is very large, it can take a long time. It looks like an administrator may have killed my process, and its possible that it was killed while an optimize( ) was in progress. I have two questions: 1) Does

Files getting deleted when optimize is killed?

2003-07-11 Thread Steve Rajavuori
I've had a problem on several occasions where my entire index is deleted -- that is, EVERY file (except 'segments') is gone. There were many users on the system each time, so its a little hard to tell for sure what was going on, but my theory is this: My code will automatically call optimize

optimize()

2002-11-26 Thread Leo Galambos
How does it affect overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

Re: optimize()

2002-11-26 Thread Otis Gospodnetic
optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com

Re: optimize()

2002-11-26 Thread Leo Galambos
overall performance, when I do not call optimize()? THX -g- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED] __ Do you Yahoo!? Yahoo! Mail Plus

RE: optimize()

2002-11-26 Thread Stephen Eaton
I don't know if this answers your question, but I had alot of problems with lucene bombing out with out of memory errors. I was not using the optimize, I tried this and hey presto no more problems. -Original Message- From: Leo Galambos [mailto:[EMAIL PROTECTED]] Sent: Wednesday, 27

large index - slow optimize()

2002-11-22 Thread Otis Gospodnetic
Hello, I am building an index with a few 1M documents, and every X documents added to the index I call optimize() on the IndexWriter. I have noticed that as the index grows this calls takes more and more time, even though the number of new segments that need to be merged is the same between every

RE: large index - slow optimize()

2002-11-22 Thread Armbrust, Daniel C.
Note - this is not a fact, this is what I think I know about how it works. My working assumption has been its just a matter of disk speed, since during optimize, the entire index is copied into new files, and then at the end, the old one is removed. So the more GB you have to copy

Re: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Otis Gospodnetic
No they don't. Note that delete() is in IndexReader. Otis --- Aruna Raghavan [EMAIL PROTECTED] wrote: Hi, Do calls like optimize() and delete() on the Indexwriter cause a separate thread to be kicked off? Thanks! Aruna. -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED

RE: optimize(), delete() calls on IndexWriter

2002-03-08 Thread Aruna Raghavan
Yes, thanks. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] Sent: Friday, March 08, 2002 11:46 AM To: Lucene Users List Subject: Re: optimize(), delete() calls on IndexWriter No they don't. Note that delete() is in IndexReader. Otis --- Aruna Raghavan [EMAIL