FYI, this optimization resulted in a fantastic
performance boost! I went from 133 queries/sec to 990
queries per sec! I'm now more limited by socket
overhead, as I get 1700 queries/sec when I stick the
clients right in the same process as the server.
Oddly enough, the performance increased, but
> For example, Nutch automatically translates such
> clauses into QueryFilters.
Thanks for the excellent pointer Doug! I'll will
definitely be implementing this optimization.
If anyone cares, I did a 1 minute hprof test with the
search server in a servlet container. Here are the
results (sorry
Yonik Seeley wrote:
Setup info & Stats:
- 4.3M documents, 12 keyword fields per document, 11
[ ... ]
"field1:4 AND field2:188453 AND field3:1"
field1:4 done alone selects around 4.2M records
field2:188453 done alone selects around 1.6M records
field3:1 done alone selects around 1K record
Oops, CPU usage is *not* 50%, but closer to 98%.
This is due to a bug in CPU% on RHEL 3 on
multiprocessor CPUS (I ran run multiple threads in
while(1) loops, and it will still only show 50% CPU
usage for that process). The agregated (not
per-process) statistics shown by top are correct, and
they s
Yonik,
there is another "synchronized" block in CSInputStream which could block
your second cpu out. Do you think there is a chance to recreate the
index (maybe a smaller subset) without compound file option enabled and
run your test again, so that we can see if this helps ?
regards
Bernhard
Ot
Ah, you may be right (no stack trace in email any more). Somebody
recenly identified a few bottlenecks that, if I recall correctly, were
related to synchronized blocks. I believe Doug committed some
improvements, but I can't remember which version of Lucene that is in.
It's definitely in 1.4.1.
--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
> The bottleneck seems to be disk IO.
But it's not. Linux is caching the whole file, and
there really isn't any disk activity at all. Most of
the threads are blocked on InputStream.refill, not
waiting for the disk, but waiting for their turn into
--- Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:
> The bottleneck seems to be disk IO.
But it's not. Linux is caching the whole file, and
there really isn't any disk activity at all. Most of
the threads are blocked on InputStream.refill, not
waiting for the disk, but waiting for their turn into
The bottleneck seems to be disk IO.
Since this is a read-only index, why not spread some of the frequently
scanned index files over multiple disks, or put the index on SCSI disks
hooked up in a RAID. Maybe this is already the case, but you didn't
mention in.
Oh, I already answered a similar quest
Hi,
I'm trying to figure out how to speed up queries to a
large index.
I'm currently getting 133 req/sec, which isn't bad,
but isn't too close
to MySQL, which is getting 500 req/sec on the same
hardware with the
same set of documents.
Setup info & Stats:
- 4.3M documents, 12 keyword fields per do
10 matches
Mail list logo