Doron Cohen wrote:
Doug Cutting wrote:
Therefore, a semi compound segment file can be defined, that would
be
made of 4 files (instead of 1):
- File 0: .fdx .tis .tvx
- File 1: .fdt .tii .tvd
- File 2: .frq .tvf
- File 3: .fnm .prx .fN
I think this is a promising direction.
@lucene.apache.org
Sent: Sunday, December 17, 2006 2:31:42 PM
Subject: Re: potential indexing perormance improvement for compound index - cut
IO - have more files though
Doron Cohen wrote:
Also, if nio proves to be faster in this scenario, it might make sense to
keep current FSDirectory, and just add
A word of caution here...
Using a shared FileChannel.pread actually performs a synchronization
under Windows.
See JDK bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734
I submitted this, and it was verified using the supplied test case.
On Dec 17, 2006, at 1:31 PM, Doug
robert engels wrote:
Using a shared FileChannel.pread actually performs a synchronization
under Windows.
Sigh. Still, it'd be no worse than current FSDirectory on Windows.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
I think the important issues are index size, stability and number of
concurrent readers.
We achieved the best performance by using a pool of file descriptors
to a segment so we could avoid the synchronization block, but this
only worked for large, relatively unchanging segments.
On Dec
Doron Cohen wrote:
Also, if nio proves to be faster in this scenario, it might make sense to
keep current FSDirectory, and just add FSDirectoryNio implementation.
If nio isn't considerably slower for single-threaded applications, I'd
vote to simply switch FSDirectory to use nio, simplifying
Otis Gospodnetic wrote:
I think Doron is right on the money here. I know one customer who'd be happy
to trade its file descriptors for less IO - simpy.com. It's exactly what Doron describes
- a busy system with a LOT of indices. File descriptors are kept under control by
carefully closing
On Dec 15, 2006, at 2:04 PM, Otis Gospodnetic wrote:
I think Doron is right on the money here. I know one customer
who'd be happy to trade its file descriptors for less IO -
simpy.com. It's exactly what Doron describes - a busy system with
a LOT of indices. File descriptors are kept
Marvin Humphrey wrote:
Out of curiosity, does the non-compound format yield any search-time
benefits?
Yes. On 32-bit systems with indexes larger than 1GB or so, memory
mapping is impractical, so synchronization is required around shared
file handles (using Java's classic i/o APIs, w/o
Doug Cutting wrote:
I'm not yet convinced that the costs of this mid-point justify its
benefits.
That was too negative. Let me try a more positive angle.
Doron Cohen wrote:
Therefore, a semi compound segment file can be defined, that would be
made of 4 files (instead of 1):
- File 0: .fdx
Doug Cutting wrote:
Yes. On 32-bit systems with indexes larger than 1GB or so, memory
mapping is impractical, so synchronization is required around shared
file handles (using Java's classic i/o APIs, w/o pread). The
non-compound format, with more files, has fewer synchronization
Doug Cutting wrote:
Doug Cutting wrote:
Yes. On 32-bit systems with indexes larger than 1GB or so, memory
mapping is impractical, so synchronization is required around shared
file handles (using Java's classic i/o APIs, w/o pread). The
non-compound format, with more files, has fewer
Doug Cutting wrote:
Therefore, a semi compound segment file can be defined, that would be
made of 4 files (instead of 1):
- File 0: .fdx .tis .tvx
- File 1: .fdt .tii .tvd
- File 2: .frq .tvf
- File 3: .fnm .prx .fN
I think this is a promising direction. Perhaps instead of adding a
Hi,
I would like to propose and get feedback on a potential indexing
performance improvement for the case that compound file is used (this is
the default).
In compound segment mode, each merge operation is ended by writing a
compound file. To be more precise, the merge result is first written
On 12/14/06, Doron Cohen [EMAIL PROTECTED] wrote:
But anyhow, this is not a negligible difference, and for real large
indexes, and busy systems, when the just written non-compound segment is
not in the system caches, it might have more effect. Possibly, search
performance during indexing would
Mike Klaas [EMAIL PROTECTED] wrote:
My main comment is that the benefits of this change can be achieved by
using the non-compound index format. For people that care about the
difference in performance, it isn't difficult to configure your system
to mitigate the problems of the non-compound
@lucene.apache.org
Sent: Friday, December 15, 2006 2:55:41 PM
Subject: Re: potential indexing perormance improvement for compound index - cut
IO - have more files though
Mike Klaas [EMAIL PROTECTED] wrote:
My main comment is that the benefits of this change can be achieved by
using the non-compound
17 matches
Mail list logo