Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-20 Thread Doron Cohen
Doron Cohen wrote: Doug Cutting wrote: Therefore, a semi compound segment file can be defined, that would be made of 4 files (instead of 1): - File 0: .fdx .tis .tvx - File 1: .fdt .tii .tvd - File 2: .frq .tvf - File 3: .fnm .prx .fN I think this is a promising direction.

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-19 Thread Otis Gospodnetic
@lucene.apache.org Sent: Sunday, December 17, 2006 2:31:42 PM Subject: Re: potential indexing perormance improvement for compound index - cut IO - have more files though Doron Cohen wrote: Also, if nio proves to be faster in this scenario, it might make sense to keep current FSDirectory, and just add

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-18 Thread robert engels
A word of caution here... Using a shared FileChannel.pread actually performs a synchronization under Windows. See JDK bug http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6265734 I submitted this, and it was verified using the supplied test case. On Dec 17, 2006, at 1:31 PM, Doug

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-18 Thread Doug Cutting
robert engels wrote: Using a shared FileChannel.pread actually performs a synchronization under Windows. Sigh. Still, it'd be no worse than current FSDirectory on Windows. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED]

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-18 Thread robert engels
I think the important issues are index size, stability and number of concurrent readers. We achieved the best performance by using a pool of file descriptors to a segment so we could avoid the synchronization block, but this only worked for large, relatively unchanging segments. On Dec

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-17 Thread Doug Cutting
Doron Cohen wrote: Also, if nio proves to be faster in this scenario, it might make sense to keep current FSDirectory, and just add FSDirectoryNio implementation. If nio isn't considerably slower for single-threaded applications, I'd vote to simply switch FSDirectory to use nio, simplifying

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Otis Gospodnetic wrote: I think Doron is right on the money here. I know one customer who'd be happy to trade its file descriptors for less IO - simpy.com. It's exactly what Doron describes - a busy system with a LOT of indices. File descriptors are kept under control by carefully closing

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Marvin Humphrey
On Dec 15, 2006, at 2:04 PM, Otis Gospodnetic wrote: I think Doron is right on the money here. I know one customer who'd be happy to trade its file descriptors for less IO - simpy.com. It's exactly what Doron describes - a busy system with a LOT of indices. File descriptors are kept

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Marvin Humphrey wrote: Out of curiosity, does the non-compound format yield any search-time benefits? Yes. On 32-bit systems with indexes larger than 1GB or so, memory mapping is impractical, so synchronization is required around shared file handles (using Java's classic i/o APIs, w/o

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Doug Cutting wrote: I'm not yet convinced that the costs of this mid-point justify its benefits. That was too negative. Let me try a more positive angle. Doron Cohen wrote: Therefore, a semi compound segment file can be defined, that would be made of 4 files (instead of 1): - File 0: .fdx

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doug Cutting
Doug Cutting wrote: Yes. On 32-bit systems with indexes larger than 1GB or so, memory mapping is impractical, so synchronization is required around shared file handles (using Java's classic i/o APIs, w/o pread). The non-compound format, with more files, has fewer synchronization

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doron Cohen
Doug Cutting wrote: Doug Cutting wrote: Yes. On 32-bit systems with indexes larger than 1GB or so, memory mapping is impractical, so synchronization is required around shared file handles (using Java's classic i/o APIs, w/o pread). The non-compound format, with more files, has fewer

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-16 Thread Doron Cohen
Doug Cutting wrote: Therefore, a semi compound segment file can be defined, that would be made of 4 files (instead of 1): - File 0: .fdx .tis .tvx - File 1: .fdt .tii .tvd - File 2: .frq .tvf - File 3: .fnm .prx .fN I think this is a promising direction. Perhaps instead of adding a

potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-15 Thread Doron Cohen
Hi, I would like to propose and get feedback on a potential indexing performance improvement for the case that compound file is used (this is the default). In compound segment mode, each merge operation is ended by writing a compound file. To be more precise, the merge result is first written

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-15 Thread Mike Klaas
On 12/14/06, Doron Cohen [EMAIL PROTECTED] wrote: But anyhow, this is not a negligible difference, and for real large indexes, and busy systems, when the just written non-compound segment is not in the system caches, it might have more effect. Possibly, search performance during indexing would

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-15 Thread Doron Cohen
Mike Klaas [EMAIL PROTECTED] wrote: My main comment is that the benefits of this change can be achieved by using the non-compound index format. For people that care about the difference in performance, it isn't difficult to configure your system to mitigate the problems of the non-compound

Re: potential indexing perormance improvement for compound index - cut IO - have more files though

2006-12-15 Thread Otis Gospodnetic
@lucene.apache.org Sent: Friday, December 15, 2006 2:55:41 PM Subject: Re: potential indexing perormance improvement for compound index - cut IO - have more files though Mike Klaas [EMAIL PROTECTED] wrote: My main comment is that the benefits of this change can be achieved by using the non-compound