Thanks Otis. I tweaked the Solr example app a little and then uploaded a ~55KB document to it a couple of thousand times (changing the ID each time). The solr/data directory was 72MB on disc after adding the document 2000 times, so it seems that the index is growing by approximately 36KB for each document. That seems reasonable.
I guess I need to do some research into expected data volumes now, and limits on Lucene index size. Cheers, Phil Otis Gospodnetic wrote: > > Phil, > > From what you described so far, I don't see any red flags. I would pay > attention to reading those timestamps (covered on the Wiki and ML > archives), that's all. > > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- >> From: philmccarthy <philmccar...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Tuesday, January 13, 2009 8:49:33 PM >> Subject: Indexing the same data in many records >> >> >> Hi, >> >> I'd like to use Solr to index some webserver logs, in order to allow easy >> ad-hoc querying and analysis. Each Solr Document will represent a single >> request to the webserver, with fields for time, request URL, referring >> URL >> etc. >> >> I'm also planning to fetch the page source of each referring URL, and add >> that as an indexed field in the Solr document. The aim is to allow >> queries >> like "find hits to /xyz.html where the referring page contains the word >> 'foobar'". >> >> Since hundreds or even thousands of hits may all come from the same >> referring page, would this approach be horribly inefficient? (Note the >> page >> source won't be stored in each Document, just indexed). Am I going to >> dramatically increase the index size if I do this? >> >> If so, is there a more elegant way to do what I want? >> >> Many thanks, >> Phil >> >> >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21448465.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Indexing-the-same-data-in-many-records-tp21448465p21468706.html Sent from the Solr - User mailing list archive at Nabble.com.