Bernhard Messer writes:
However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
there already was a discussion about disk usage during index optimize.
Please have a look to the developers list at:
http://mail-archives.apache.org/eyebrowse/[EMAIL
Hello, Doug.
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like
Hi all
We have the same problem.
We guess that the problem is that windows lock files.
Our enviroment:
Windows 2000
Tomcat 5.5.4
Ernesto.
[EMAIL PROTECTED] escribió:
Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.
My
Hi all
We have a big index and a little space in disk.
When optimize and all space is consumed, our index is corrupted.
segments file point to nonexistent files.
Enviroment:
java 1.4.2_04
W2000 SP4
Tomat 5.5.4
Bye,
Ernesto.
Yura Smolsky escribió:
Hello, Otis.
There is a big difference when you use
However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
there already was a discussion about disk usage during index optimize.
Please have a look to the developers list at:
http://mail-archives.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=1797569
http
Get and try Lucene 1.4.3. One of the older versions had a bug that was
not deleting old index files.
Otis
--- [EMAIL PROTECTED] wrote:
Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.
My understanding
]
escribiꊾ
Hi,
When I run an optimize in our production environment,
old index are
left in the directory and are not deleted.
My
understanding is that an
optimize will create new index files and all
existing index files should be
deleted. Is this correct?
We
are running Lucene
guess that the problem is that windows lock files.
Our enviroment:
Windows 2000
Tomcat 5.5.4
Ernesto.
[EMAIL PROTECTED]
escribi
Hi,
When I run an optimize in our production environment,
old index are
left in the directory and are not deleted.
My
understanding is that an
optimize will create new
Hi Otis, tried version 1.4.3 without success, old index files still
remain in the directory.
Also tried not calling optimize(), and still getting the same behaviour,
maybe our problem is not related to optimize() call at all.
--p
Otis Gospodnetic wrote:
Get and try Lucene 1.4.3. One
of your IndexReaders.
Hope it helps,
Steve
Patricio Keilty wrote:
Hi Otis, tried version 1.4.3 without success, old index files still
remain in the directory.
Also tried not calling optimize(), and still getting the same behaviour,
maybe our problem is not related to optimize() call at all.
--p
the
optimize during the night).
After the optimize is completed, I close and
re-open the readers so they start reading from the new index files. I'm
thinking
of adding code to delete all the old files at that point. I presume they
will no longer be locked.
--- Lucene Users List lucene-user
Hi,
When I run an optimize in our production environment, old index are
left in the directory and are not deleted.
My understanding is that an
optimize will create new index files and all existing index files should be
deleted. Is this correct?
We are running Lucene 1.4.2 on Windows
Your understanding is right!
The old existing files should be deleted,but it will build new files!
On Thu, 03 Feb 2005 17:36:27 -0800 (PST),
[EMAIL PROTECTED] [EMAIL PROTECTED]
wrote:
Hi,
When I run an optimize in our production environment, old index are
left in the directory
Yura Smolsky wrote:
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like
Hello, Otis.
There is a big difference when you use compound index format or
multiple files. I have tested it on the big index (45 Gb). When I used
compound file then optimize takes 3 times more space, b/c *.cfs needs
to be unpacked.
Now I do use non compound file format. It needs like twice
Morus,
that description of 3 sets of index files is what I was imagining, too.
I'll have to test and add to the book errata, it seems.
Thanks for the info,
Otis
--- Morus Walter [EMAIL PROTECTED] wrote:
Otis Gospodnetic writes:
Hello,
Yes, that is how optimize works - copies all
Just a quick question: after writing an index and then calling
optimize(), is it normal for the index to expand to about three times
the size before finally compressing?
In our case the optimise grinds the disk, expanding the index into many
files of about 145MB total, before compressing down
Hello,
Yes, that is how optimize works - copies all existing index segments
into one unified index segment, thus optimizing it.
see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space
However, three times the space sounds a bit too much, or I make a
mistake in the book. :)
You
Our copy of LIA is in the mail ;)
Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes),
and segments (29 bytes).
--Leto
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Hello,
Yes, that is how optimize works - copies all existing index
Have you tried using the multifile index format? Now I wonder if there
is actually a difference in disk space cosumed by optimize() when you
use multifile and compound index format...
Otis
--- Kauler, Leto S [EMAIL PROTECTED] wrote:
Our copy of LIA is in the mail ;)
Yes the final three
Are not optimized indices causing you any problems (e.g. slow searches,
high number of open file handles)? If no, then you don't even need to
optimize until those issues become... issues.
OK I have changed the process to not doing optimize() at all. So far so
good. The number of files hover
of open file handles)? If no, then you don't even need
to
optimize until those issues become... issues.
OK I have changed the process to not doing optimize() at all. So far
so
good. The number of files hover from 10 to 40 during the indexing of
10,000 files. Seems Lucene is doing
Right now I am incrementally adding about 100 documents to the index a day
and then optimize after that. I find that optimize essentially rebuilding
the entire index into a single file. So the size of disk write is
proportion to the total index size, not to the size of documents
Hello,
I think some of these questions my be answered in the jGuru FAQ
So my question is would it be an overkill to optimize everyday?
Only if lots of documents are being added/deleted, and you end up with
a lot of index segments.
Is
there
any guideline on how often to optimize
Hello John,
Once you make your change locally, use 'cvs diff -u IndexWriter.java
indexwriter.patch' to make a patch.
Then open a new Bugzilla entry.
Finally, attach your patch to that entry.
Note that Document deletion is actually done from IndexReader, so your
patch may have to be on
Hello John,
I believe you didn't get any replies to this. What you are describing
cannot be done using the public, but maaay (no source code on this
machine, so I can't double-check that) be doable if you use some of the
'internal' methods.
I don't have the need for this, but others might, so
Hi Otis:
Thanks for you reply.
I am looking for more of an API call than a tool. e.g.
IndexWriter.finalizeDelete()
If I implement this, how would I go about submitting a patch?
thanks
-John
On Mon, 13 Dec 2004 22:24:12 -0800 (PST), Otis Gospodnetic
[EMAIL PROTECTED] wrote:
Lucene standard API does not support this kind of operation.
Aviran
http://www.aviransplace.com
-Original Message-
From: John Wang [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 08, 2004 17:32 PM
To: [EMAIL PROTECTED]
Subject: Re: finalize delete without optimize
Hi folks
Hi:
Is there a way to finalize delete, e.g. actually remove them from
the segments and make sure the docIDs are contiguous again.
The only explicit way to do this is by calling
IndexWriter.optmize(). But this call does a lot more (also merges all
the segments), hence is very expensive. Is
On Tuesday 23 November 2004 00:06, Kevin A. Burton wrote:
I'm wondering about the potential for a generic JDBCDirectory for
keeping the lucene index within a database.
Such a thing already exists: http://ppinew.mnis.com/jdbcdirectory/, but I
don't know about its scalability.
Regards
Daniel
optimizations so if you add 10 documents you have to run optimize()
again and this isn't exactly a fast operation.
I'm wondering about the potential for a generic JDBCDirectory for
keeping the lucene index within a database.
It sounds somewhat unconventional would allow you to perform live
addDirectory
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Kevin
--
Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an
invite! Also see irc.freenode.net #rojo if you
On Nov 23, 2004, at 6:02 PM, Kevin A. Burton wrote:
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Huh? Why would you assume that? As far as I know, and I've tested
this
It seems that when compared to other datastores that Lucene starts to
fall down. For example lucene doesn't perform online index
optimizations so if you add 10 documents you have to run optimize()
again and this isn't exactly a fast operation.
I'm wondering about the potential for a generic
problem is that the index needs to be continually kept up to date with new news
articles, but also needs to be regularly optimized to keep it fast. If I cannot
update and optimize one index at the same time the best way I can see of doing this is
maintaining multiple identical indexes and offlining
time. Our problem is that the index
needs to be continually kept up to date with new news articles, but also
needs to be regularly optimized to keep it fast. If I cannot update and
optimize one index at the same time the best way I can see of doing this
is maintaining multiple identical indexes
to save a deletion? Save a pending one, or commit it
(commit - really delete it)?
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close()
Also optimize doesn't mention deletions.
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWriter.html
Kevin A. Burton wrote:
With the typical handful of fields, one should never see more than
hundreds of files.
We only have 13 fields... Though to be honest I'm worried that even if I
COULD do the optimize that it would run out of file handles.
Optimization doesn't open all files at once
Subject: Way to repair an index broking during 1/2 optimize?
So.. the other day I sent an email about building an index with 14M
documents.
That went well but the optimize() was taking FOREVER. It took 7 hours
to generate the whole index and when complete as of 10AM it was still
optimizing (6
optimize(). Please tell us how it goes.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Peter M Cipollone wrote:
You might try merging the existing index into a new index located on a ram
disk. Once it is done, you can move the directory from ram disk back to
your hard disk. I think this will work as long as the old index did not
finish merging. You might do a strings command on
Doug Cutting wrote:
Kevin A. Burton wrote:
Also... what can I do to speed up this optimize? Ideally it wouldn't
take 6 hours.
Was this the index with the mergeFactor of 5000? If so, that's why
it's so slow: you've delayed all of the work until the end. Indexing
on a ramfs will make things
.
I'm worried about duplicate or missing content from the original index.
I'd rather rebuild the index and waste another 6 hours (I've probably
blown 100 hours of CPU time on this already) and have a correct index :)
During an optimize I assume Lucene starts writing to a new segment and
leaves all
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.
Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that
it logs merges? If so, it would be interesting to see that output,
especially the last
Doug Cutting wrote:
Kevin A. Burton wrote:
No... I changed the mergeFactor back to 10 as you suggested.
Then I am confused about why it should take so long.
Did you by chance set the IndexWriter.infoStream to something, so that
it logs merges? If so, it would be interesting to see that output,
Kevin A. Burton wrote:
During an optimize I assume Lucene starts writing to a new segment and
leaves all others in place until everything is done and THEN deletes them?
That's correct.
The only settings I uses are:
targetIndex.mergeFactor=10;
targetIndex.minMergeDocs=1000;
the resulting index has
of fields, one should never see more than
hundreds of files.
We only have 13 fields... Though to be honest I'm worried that even if I
COULD do the optimize that it would run out of file handles.
This is very strange...
I'm going to increase minMergeDocs to 1 and then run the full
converstion on one
Hey y'all again,
Just wondering why the IndexWriter.addIndexes method calls optimize before and after
it starts merging segments together.
We would like to create an addIndexes method that doesn't optimize and call optimize
on the IndexWriter later.
Roy
?
Category:
I optimize and close the index after that, but don't get just one .cvs
file as it promised in doc. Instead of it I see something like small
segments and a couple of big.
This weird behavor seems started since i changed from v 1.4-rc2 to
1.4-rc3.
Before I got just one cvs segment
Hello,
I am running a two-week old version of Lucene from the CVS HEAD and seeing the same
behavior.?
Regards,
RBP
-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Envoy : mercredi 2 juin 2004 13:53
: Lucene Users List
Objet : Re: optimize
Hi,
sorry for following up my own mail, but since no one responded so
far, I thought the stacktrace might be of interested. The following
exception always occurs when trying to optimize one of our indizes,
which always went ok for about a year now. I just tried with 1.4-rc3,
but with the same
trying to optimize one of our indizes,
which always went ok for about a year now. I just tried with 1.4-rc3,
but with the same result:
java.io.IOException: Negative seek offset
at java.io.RandomAccessFile.seek(Native Method)
at org.apache.lucene.store.FSInputStream.readInternal
Am Mittwoch, 12. Mai 2004 18:54 schrieb Anthony Vito:
Looks like the same error I got when I tried to use Lucene version
1.3 to search on an index I had created with Lucene version 1.4. The
versions are not forward compatible. Did you by chance create the
index with version 1.4 and are now
to the JVM. For indexing 1,000,000 complex documents, with
potentially around 30 fields each, this seems to work fine.
I have noticed that when performing an optimize() on this index at the end
of a batch load, the memory requirements seem to be much higher. I was
receiving OutOfMemoryErrors
Hi,
I have no idea where to look for, and I know almost nothing about
java :-( We're using lucene quite a while now (about a year I guess)
and suddenly I've seen this when trying to optimize the index:
java.lang.Exception: java.io.IOException: Negative seek offset
The code throwing
Let's say you have two indexes each with the same document literal. All
the fields hash the same and the document is a binary duplicate of a
different document in the second index.
What happens when you do a merge to create a 3rd index from the first
two? I assume you now have two documents
Kevin,
I have a similar issue. The only solution I have been
able to come up with is, after the merge, to open an
IndexReader against the merge index, iterate over all
the docs and delete duplicate docs based on my
primary key field.
Jim
--- Kevin A. Burton [EMAIL PROTECTED] wrote:
Let's say
The index should be fine. Lucene index updates are atomic.
Doug
Dan Quaroni wrote:
My index grew about 7 gigs larger than I projected it would, and it ran out
of disk space during optimize. Does lucene have transactions or anything
that would prevent this from corrupting an index, or do I need
My index grew about 7 gigs larger than I projected it would, and it ran out
of disk space during optimize. Does lucene have transactions or anything
that would prevent this from corrupting an index, or do I need to generate
the index again?
Thanks
HI,
From: Dan Quaroni [mailto:[EMAIL PROTECTED]
My index grew about 7 gigs larger than I projected it would,
and it ran out of disk space during optimize. Does lucene
have transactions or anything that would prevent this from
corrupting an index, or do I need to generate the index
Upon further examination what I found is this:
- Killing the process while optimize() is still working does NOT cause the
index files to be deleted, HOWEVER --
- Once the index is opened again by a new process (now apparently in an
unstable state due to the incomplete optimize()), at that time
is this:
My code will automatically call optimize( ) periodically. Because the
index
is very large, it can take a long time. It looks like an
administrator may
have killed my process, and its possible that it was killed while an
optimize( ) was in progress.
I have two questions:
1) Does
I've had a problem on several occasions where my entire index is deleted --
that is, EVERY file (except 'segments') is gone. There were many users on
the system each time, so its a little hard to tell for sure what was going
on, but my theory is this:
My code will automatically call optimize
How does it affect overall performance, when I do not call optimize()?
THX
-g-
--
To unsubscribe, e-mail: mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]
optimize()?
THX
-g-
--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]
__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
overall performance, when I do not call
optimize()?
THX
-g-
--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]
__
Do you Yahoo!?
Yahoo! Mail Plus
I don't know if this answers your question, but I had alot of problems
with lucene bombing out with out of memory errors. I was not using the
optimize, I tried this and hey presto no more problems.
-Original Message-
From: Leo Galambos [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, 27
Hello,
I am building an index with a few 1M documents, and every X documents
added to the index I call optimize() on the IndexWriter.
I have noticed that as the index grows this calls takes more and more
time, even though the number of new segments that need to be merged is
the same between every
Note - this is not a fact, this is what I think I know about how it works.
My working assumption has been its just a matter of disk speed, since during optimize,
the entire index is copied into new files, and then at the end, the old one is
removed. So the more GB you have to copy
No they don't. Note that delete() is in IndexReader.
Otis
--- Aruna Raghavan [EMAIL PROTECTED] wrote:
Hi,
Do calls like optimize() and delete() on the Indexwriter cause a
separate
thread to be kicked off?
Thanks!
Aruna.
--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED
Yes, thanks.
-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]]
Sent: Friday, March 08, 2002 11:46 AM
To: Lucene Users List
Subject: Re: optimize(), delete() calls on IndexWriter
No they don't. Note that delete() is in IndexReader.
Otis
--- Aruna Raghavan [EMAIL
71 matches
Mail list logo