Hi everybody,

I have some performance problems running nutch. My scenario: I build a nutch system with

- about 5 mio indexed (!) documents (measured by luke and over the web)
- segread returns about 10 mio known documents
- there are 58 segments (making about 90.000 indexed documents per
segment)
- the segments have all about the same size (each segments takes about
2 GB including the index)
- the indexes haven been merged to one "total index" (9 GB by now)
- one "nutch server" handles the queries
- hardware: Intel Celeron 2,8, 1GB RAM, 250 GB Sata-HD
- the apche/mod_jk/tomcat frontend is on a seperate server

I observer severe performance problems when handeling a load over
1 searches/s. The search within the indexes is pretty quick, but it takes forever to read the summary (getSummary) from disk. And there seems to build up some kind of backlog.

The bottleneck seems to be the disk-i/o. So I made some tests with smaller segments and it get's a little bit better. Faster disks would be nice, but I'm afraid it's only a matter of time when I get to the same problem again. I still search for a konfiguration mistake/problem.

How do you manage you systems? Does anybody have any hints, how to tune the system?

Regards

        Michael

--
Michael Nebel
http://www.nebel.de/
http://www.netluchs.de/



-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP, AMD, and NVIDIA. To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to