Yikes! That's way too many files. Have you changed mergeFactor? Or
implemented a custom DeletionPolicy or MergePolicy?
Or... does anyone know of something else in Solr's configuration that
could lead to such an insane number of files?
Mike
Uwe Klosa wrote:
There are around 35.000 files in the index. When I started Indexing
5 weeks
ago with only 2000 documents I did not this issue. I have seen it
the first
time with around 10.000 documents.
Before that I have been using the same instance on a Linux machine
with up
to 17.000 documents and I haven't seen this issue at all. The
original plan
has always been to use Solr on Linux, but I'm still waiting for the
new
server.
Uwe
On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
Hmm OK that seems like a possible explanation then. Still it's
spooky that
it's taking 5 minutes. How many files are in the index at the time
you call
commit?
I wonder if you were to simply pause for say 30 seconds, before
issuing the
commit, whether you'd then see the commit go faster? On Windows at
least
such a silly trick does seem to improve performance, I think
because it
allows the OS to move the bytes from its write cache onto stable
storage "on
its own schedule" whereas when we commit we are demanding the OS
move the
bytes on our [arbitrary] schedule.
I really wish OSs would add an API that would just block & return
once the
file has made it to stable storage (letting the OS sync on its own
optimal
schedule), rather than demanding the file be fsync'd immediately.
I really haven't explored the performance of fsync on different
filesystems. I think I've read that ReiserFS may have issues,
though it
could have been addressed by now. I *believe* ext3 is OK (at
least, it
didn't show the strange "sleep to get better performance" issue
above, in my
limited testing).
Mike
Uwe Klosa wrote:
Thanks Mike
The use of fsync() might be the answer to my problem, because I have
installed Solr for lack of other possibilities in a zone on
Solaris with
ZFS
which slows down when many fsync() calls are made. This will be
fixed in a
upcoming release of Solaris, but I will move as soon as possible
the Solr
instances to another server with a different file system. Would
the use of
a
different file system than ext3 boost the performance?
Uwe
On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
Yonik Seeley wrote:
On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]>
wrote:
I have a big problem with one of my solr instances. A commit can
take
up
to
5 minutes. This time does not depend on the number of documents
which
are
updated. The difference for 1 or 100 updated documents is only
a few
seconds.
Since Solr's commit logic really hasn't changed, I wonder if this
could be lucene related somehow.
Lucene's commit logic has changed: we now fsync() each file in
the index
to
ensure all bytes are on stable storage, before returning.
But I can't imagine that taking 5 minutes, unless there are
somehow a
great
many files added to the index?
Uwe, what filesystem are you using?
Yonik, when Solr commits what does it actually do?
Mike